tree-sitter / node-tree-sitter

Node.js bindings for tree-sitter
https://www.npmjs.com/package/tree-sitter
MIT License
649 stars 114 forks source link

Convert module to use NAPI instead of Nan #52

Closed maxbrunsfeld closed 8 months ago

maxbrunsfeld commented 5 years ago

Summary

This PR reworks all of the native code to use Node's new ABI-stable addon API (napi), instead of using NAN, a C++ helper library that wraps the V8 and Node APIs.

Benefit

ABI-stability means that this module will no longer need to be recompiled when upgrading to a new version of Node.js or electron. Even better, this is the first step toward individual language modules like tree-sitter-javascript having the same ABI stability.

In addition, Napi is also just a much cleaner and more consistent API to use than Nan + V8.

Limitations

Unfortunately, we can't achieve true ABI-stability all in one fell swoop. This module has two binary interfaces that currently depend on Nan / V8 directly:

  1. Retrieving the TSLanguage * pointers out of Language objects from language modules. Currently, the Node.js binding code that the tree-sitter CLI generates for individual languages are reliant on nan and the V8 ABI 😞 .
  2. Parsing native TextBuffer objects provided by superstring - For performance/concurrency reasons, this module can interact with TextBuffer objects directly through native code, without calling back into JavaScript. This interface also relies on nan/V8 😭 .

The module won't truly have a stable ABI until these ☝️ things are fixed. I will have to update the Tree-sitter CLI to generate binding files that use napi. I probably won't attempt to convert superstring to NAPI, but I can update the TextBufferSnapshotWrapper class, which is the one shared between modules, to expose its data pointer in a way that doesn't depend on Nan.

rien commented 4 years ago

How usable is this PR? I'm still suffering from #53 and I was wondering if this could maybe solve the issue.

maxbrunsfeld commented 4 years ago

I don't think #53 is due to bugs in the native code; I think it's because some JavaScript code is running multiple times, even though it is designed to only be run once.

That said, this PR should be usable. All the tests pass, I just haven't verified it yet in any real application, like Atom.

drom commented 4 years ago

@maxbrunsfeld do you plan to merge this PR? It would super useful to switch to NAPI

gpetrov commented 4 years ago

I'm looking forward to the NAPI integration as well, so that we can use the powerful tree-sitter in multi threads and web workers.

cellog commented 3 years ago

@maxbrunsfeld I spent some time experimenting with how to upgrade the languages to use N-API. Turns out to be smaller than I expected. Here's the Javascript update:

bindings.cc:

#include "tree_sitter/parser.h"
#include <napi.h>

using namespace Napi;

extern "C" TSLanguage * tree_sitter_javascript();

void NoOpFinalizer(Napi::Env _env, TSLanguage *_ignore) {
  /* do nothing, no need to free */
}

class TreeSitterJavascriptBinding : public Napi::Addon<TreeSitterJavascriptBinding> {
  public:
    TreeSitterJavascriptBinding(Napi::Env env, Napi::Object exports) {
      DefineAddon(exports, {
        InstanceValue("name", Napi::String::New(env, "javascript"), napi_enumerable),
        InstanceValue("__parser", Napi::External<TSLanguage>::New(env, tree_sitter_javascript(), NoOpFinalizer), napi_default)  
      });
    }
};

NODE_API_ADDON(TreeSitterJavascriptBinding)

binding.gyp:

{
  "targets": [
    {
      "target_name": "tree_sitter_javascript_binding",
      "include_dirs": [
        "<!@(node -p \"require('node-addon-api').include\")",
        "src"
      ],
      "sources": [
        "src/parser.c",
        "src/scanner.c",
        "src/binding.cc"
      ],
      "cflags_c": [
        "-std=c99",
      ],
      "defines": [ "NAPI_DISABLE_CPP_EXCEPTIONS" ]
    }
  ]
}

index.js:

module.exports = require("bindings")("tree_sitter_javascript_binding");

try {
  module.exports.nodeTypeInfo = require("./src/node-types.json");
} catch (_) {}

This creates a non-enumerable property __parser which is the TSLanguage *. Updating the node-tree-sitter will involve retrieving the __parser property off of the language passed in, and verifying it's there, then extracting its value with the Data() method. The rest of the changes are pretty much the same. I'm going to try running the N-API conversion.js script on master and update to use my fixed up Javascript language. if it works, I'll see if I can figure out how to mod the tree-sitter-cli to spit out this changed setup. Of course, this is a breaking change, but as long as everything is bumped a major version, should be good to go.

I made this little test script:

const Javascript = require(".");

console.log(Javascript.name);
console.log(Javascript.__parser);

output:

Screen Shot 2021-02-15 at 5 56 04 PM
cellog commented 3 years ago

last note for now. This is what I think the UnwrapLanguage function becomes with this change:

const TSLanguage *UnwrapLanguage(Napi::Env env, const Napi::Value &value) {
  if (value.IsObject()) {
    Napi::Object arg = value.As<Napi::Object>();
    Napi::Value name = arg.Get("name");
    if (!env.IsExceptionPending()) {
      if (name.IsString()) {
        Napi::Value parser = arg.Get("__parser");
        if (!env.IsExceptionPending()) {
          if (parser.IsExternal()) {
            Napi::External<TSLanguage> parserInfo = parser.As<Napi::External<TSLanguage> >();
            const TSLanguage *language = (const TSLanguage *) parserInfo.Data();
            if (language) {
              uint16_t version = ts_language_version(language);
              if (
                version < TREE_SITTER_MIN_COMPATIBLE_LANGUAGE_VERSION ||
                version > TREE_SITTER_LANGUAGE_VERSION
              ) {
                std::string message =
                  "Incompatible language version. Compatible range: " +
                  std::to_string(TREE_SITTER_MIN_COMPATIBLE_LANGUAGE_VERSION) + " - " +
                  std::to_string(TREE_SITTER_LANGUAGE_VERSION) + ". Got: " +
                  std::to_string(ts_language_version(language));
                Napi::RangeError::New(env, message.c_str()).ThrowAsJavaScriptException();
                return nullptr;
              }
              return language;
            }
          }
        }
      }
    }
  }
  Napi::TypeError::New(env, "Invalid language object").ThrowAsJavaScriptException();

  return nullptr;
}
talbergs commented 3 years ago

Duplicate of https://github.com/tree-sitter/node-tree-sitter/pull/81 ?

ahlinc commented 3 years ago

@talbergs no, it's not duplicate, the #81 merge just fixed SegmentationFault for current Nan implementation. Napi is coming soon, it requires more work than this old PR.

gpetrov commented 2 years ago

@maxbrunsfeld @cellog With all those Node and Electron versions a good NAPI integration is getting more and more essential.

Also please consider using the Node::Buffer instead of ArrayBuffer due to the many depreciations, as advised in https://github.com/electron/electron/issues/29893#issuecomment-868768506

I tried to patch things up in https://github.com/tree-sitter/node-tree-sitter/pull/95 but still a full conversion and NAPI support with node buffer would be much better!

maxbrunsfeld commented 2 years ago

Hi, would anyone be willing to take on the task of merging master into this branch? I think the time has come to land this PR.

For context - when I first authored this PR, NAPI was a bit newer, and I was worried about the risk of switching to it given that Atom relies on this node module. But at this point in time, the Node.js ecosystem has changed a lot, and it's more important than ever to switch over to NAPI. Also, Atom's pace of development has slowed down significantly, to the point where I don't think it's worth holding this up on that application's behalf.

As far as I know, the biggest chunk of work to be done is updating the new Query class (which was added after I created this PR) to use NAPI. There are various other conflicts as well.

If someone who uses this module could take ownership of this effort, that would be very helpful, and I would definitely merge it if somebody can get the conflicts resolved and tests passing again. I can give advice that pertains to the Tree-sitter APIs and C++ questions, but it has been quite a while since I've used Node.js for anything, so I have less ability to advise about Node-specific questions.

verhovsky commented 1 year ago

@maxbrunsfeld I believe @michaelBelousov did that here: https://github.com/michaelBelousov/node-tree-sitter/tree/napi-with-query-api

See https://github.com/tree-sitter/node-tree-sitter/issues/34#issuecomment-1061265133

MichaelBelousov commented 1 year ago

I don't use it a lot currently but I would be happy to try figure out where I left things and determine if I can make sure tests pass and then try to merge.

MichaelBelousov commented 1 year ago

I can't build master on my ubuntu box even after downgrading node to 14.x and npm to 7.x, I get the same thing as https://github.com/tree-sitter/node-tree-sitter/issues/72#issuecomment-797000374 May need to fix that first.

verhovsky commented 1 year ago

@MichaelBelousov I just made node-tree-sitter build on Node 14 and Ubuntu for #127, try my master branch.

https://github.com/verhovsky/node-tree-sitter/tree/master

It depends on the tree-sitter repository and in my fork I updated it to the latest master commit.

MichaelBelousov commented 1 year ago

I tried #127 and it didn't work... but I just took a look and it's because I didn't init submodules when cloning. Just did, so it worked now.

verhovsky commented 1 year ago

I did the same thing today.

cellog commented 1 year ago

I'm really grateful you're putting energy into this - it would shave minutes off of the i18n code parser we use for extracting translations if we could move to node instead of wasm!

MichaelBelousov commented 1 year ago

after some minor tweaks, tests pass but there's a segfault at exit during when trying to napi_delete_reference a null reference. I will dig into that when I have time in the next week. If someone wants to dig into that sooner themselves, let me know. Feel free to ping for a progress report if it's taking a while.

MichaelBelousov commented 1 year ago

It's the module_exports static ObjectReference in src/node.cc, because it's static, the destructor gets called after napi has invalidated all references, so calling its destructor is invalid, not sure why I didn't notice that in the debugger yesterday. I just needed to add a SuppressDestruct call. I could also move module_exports to the env's instance data which is safer but I think that can be done in this PR after this merges.

I have pruned my personal changes and rebased to try to make the history a bit cleaner. I have created a draft PR #129 while I fix conflicts

maxbrunsfeld commented 1 year ago

Nice work @MichaelBelousov. Thanks for the update.

MichaelBelousov commented 1 year ago

conflicts are fixed. Continuing discussion on the PR

MichaelBelousov commented 1 year ago

just a note, currently this PR uses a lot of static Napi::ObjectReference's In order for this to work in a multi-threaded settings (e.g. node's worker_threads), Napi::Env.SetInstanceData should be used instead, with some added class like BindingsEnv to hold all the things that are currently static in that way.

undefined-moe commented 1 year ago

The package currently doesn't work in worker_threads, is this pull request going to fix this?