Closed maxbrunsfeld closed 8 months ago
How usable is this PR? I'm still suffering from #53 and I was wondering if this could maybe solve the issue.
I don't think #53 is due to bugs in the native code; I think it's because some JavaScript code is running multiple times, even though it is designed to only be run once.
That said, this PR should be usable. All the tests pass, I just haven't verified it yet in any real application, like Atom.
@maxbrunsfeld do you plan to merge this PR? It would super useful to switch to NAPI
I'm looking forward to the NAPI integration as well, so that we can use the powerful tree-sitter in multi threads and web workers.
@maxbrunsfeld I spent some time experimenting with how to upgrade the languages to use N-API. Turns out to be smaller than I expected. Here's the Javascript update:
bindings.cc
:
#include "tree_sitter/parser.h"
#include <napi.h>
using namespace Napi;
extern "C" TSLanguage * tree_sitter_javascript();
void NoOpFinalizer(Napi::Env _env, TSLanguage *_ignore) {
/* do nothing, no need to free */
}
class TreeSitterJavascriptBinding : public Napi::Addon<TreeSitterJavascriptBinding> {
public:
TreeSitterJavascriptBinding(Napi::Env env, Napi::Object exports) {
DefineAddon(exports, {
InstanceValue("name", Napi::String::New(env, "javascript"), napi_enumerable),
InstanceValue("__parser", Napi::External<TSLanguage>::New(env, tree_sitter_javascript(), NoOpFinalizer), napi_default)
});
}
};
NODE_API_ADDON(TreeSitterJavascriptBinding)
binding.gyp
:
{
"targets": [
{
"target_name": "tree_sitter_javascript_binding",
"include_dirs": [
"<!@(node -p \"require('node-addon-api').include\")",
"src"
],
"sources": [
"src/parser.c",
"src/scanner.c",
"src/binding.cc"
],
"cflags_c": [
"-std=c99",
],
"defines": [ "NAPI_DISABLE_CPP_EXCEPTIONS" ]
}
]
}
index.js
:
module.exports = require("bindings")("tree_sitter_javascript_binding");
try {
module.exports.nodeTypeInfo = require("./src/node-types.json");
} catch (_) {}
This creates a non-enumerable property __parser
which is the TSLanguage *
. Updating the node-tree-sitter will involve retrieving the __parser
property off of the language passed in, and verifying it's there, then extracting its value with the Data()
method. The rest of the changes are pretty much the same. I'm going to try running the N-API conversion.js
script on master and update to use my fixed up Javascript language. if it works, I'll see if I can figure out how to mod the tree-sitter-cli to spit out this changed setup. Of course, this is a breaking change, but as long as everything is bumped a major version, should be good to go.
I made this little test script:
const Javascript = require(".");
console.log(Javascript.name);
console.log(Javascript.__parser);
output:
last note for now. This is what I think the UnwrapLanguage
function becomes with this change:
const TSLanguage *UnwrapLanguage(Napi::Env env, const Napi::Value &value) {
if (value.IsObject()) {
Napi::Object arg = value.As<Napi::Object>();
Napi::Value name = arg.Get("name");
if (!env.IsExceptionPending()) {
if (name.IsString()) {
Napi::Value parser = arg.Get("__parser");
if (!env.IsExceptionPending()) {
if (parser.IsExternal()) {
Napi::External<TSLanguage> parserInfo = parser.As<Napi::External<TSLanguage> >();
const TSLanguage *language = (const TSLanguage *) parserInfo.Data();
if (language) {
uint16_t version = ts_language_version(language);
if (
version < TREE_SITTER_MIN_COMPATIBLE_LANGUAGE_VERSION ||
version > TREE_SITTER_LANGUAGE_VERSION
) {
std::string message =
"Incompatible language version. Compatible range: " +
std::to_string(TREE_SITTER_MIN_COMPATIBLE_LANGUAGE_VERSION) + " - " +
std::to_string(TREE_SITTER_LANGUAGE_VERSION) + ". Got: " +
std::to_string(ts_language_version(language));
Napi::RangeError::New(env, message.c_str()).ThrowAsJavaScriptException();
return nullptr;
}
return language;
}
}
}
}
}
}
Napi::TypeError::New(env, "Invalid language object").ThrowAsJavaScriptException();
return nullptr;
}
Duplicate of https://github.com/tree-sitter/node-tree-sitter/pull/81 ?
@talbergs no, it's not duplicate, the #81 merge just fixed SegmentationFault for current Nan
implementation.
Napi is coming soon, it requires more work than this old PR.
@maxbrunsfeld @cellog With all those Node and Electron versions a good NAPI integration is getting more and more essential.
Also please consider using the Node::Buffer instead of ArrayBuffer due to the many depreciations, as advised in https://github.com/electron/electron/issues/29893#issuecomment-868768506
I tried to patch things up in https://github.com/tree-sitter/node-tree-sitter/pull/95 but still a full conversion and NAPI support with node buffer would be much better!
Hi, would anyone be willing to take on the task of merging master
into this branch? I think the time has come to land this PR.
For context - when I first authored this PR, NAPI was a bit newer, and I was worried about the risk of switching to it given that Atom relies on this node module. But at this point in time, the Node.js ecosystem has changed a lot, and it's more important than ever to switch over to NAPI. Also, Atom's pace of development has slowed down significantly, to the point where I don't think it's worth holding this up on that application's behalf.
As far as I know, the biggest chunk of work to be done is updating the new Query
class (which was added after I created this PR) to use NAPI. There are various other conflicts as well.
If someone who uses this module could take ownership of this effort, that would be very helpful, and I would definitely merge it if somebody can get the conflicts resolved and tests passing again. I can give advice that pertains to the Tree-sitter APIs and C++ questions, but it has been quite a while since I've used Node.js for anything, so I have less ability to advise about Node-specific questions.
@maxbrunsfeld I believe @michaelBelousov did that here: https://github.com/michaelBelousov/node-tree-sitter/tree/napi-with-query-api
See https://github.com/tree-sitter/node-tree-sitter/issues/34#issuecomment-1061265133
I don't use it a lot currently but I would be happy to try figure out where I left things and determine if I can make sure tests pass and then try to merge.
I can't build master on my ubuntu box even after downgrading node to 14.x and npm to 7.x, I get the same thing as https://github.com/tree-sitter/node-tree-sitter/issues/72#issuecomment-797000374 May need to fix that first.
@MichaelBelousov I just made node-tree-sitter build on Node 14 and Ubuntu for #127, try my master branch.
https://github.com/verhovsky/node-tree-sitter/tree/master
It depends on the tree-sitter repository and in my fork I updated it to the latest master commit.
I tried #127 and it didn't work... but I just took a look and it's because I didn't init submodules when cloning. Just did, so it worked now.
I did the same thing today.
I'm really grateful you're putting energy into this - it would shave minutes off of the i18n code parser we use for extracting translations if we could move to node instead of wasm!
after some minor tweaks, tests pass but there's a segfault at exit during when trying to napi_delete_reference
a null reference. I will dig into that when I have time in the next week. If someone wants to dig into that sooner themselves, let me know. Feel free to ping for a progress report if it's taking a while.
It's the module_exports
static ObjectReference
in src/node.cc, because it's static, the destructor gets called after napi has invalidated all references, so calling its destructor is invalid, not sure why I didn't notice that in the debugger yesterday. I just needed to add a SuppressDestruct
call. I could also move module_exports
to the env's instance data which is safer but I think that can be done in this PR after this merges.
I have pruned my personal changes and rebased to try to make the history a bit cleaner. I have created a draft PR #129 while I fix conflicts
Nice work @MichaelBelousov. Thanks for the update.
conflicts are fixed. Continuing discussion on the PR
just a note, currently this PR uses a lot of static Napi::ObjectReference
's In order for this to work in a multi-threaded settings (e.g. node's worker_threads), Napi::Env.SetInstanceData
should be used instead, with some added class like BindingsEnv
to hold all the things that are currently static in that way.
The package currently doesn't work in worker_threads, is this pull request going to fix this?
Summary
This PR reworks all of the native code to use Node's new ABI-stable addon API (
napi
), instead of usingNAN
, a C++ helper library that wraps the V8 and Node APIs.Benefit
ABI-stability means that this module will no longer need to be recompiled when upgrading to a new version of Node.js or electron. Even better, this is the first step toward individual language modules like
tree-sitter-javascript
having the same ABI stability.In addition,
Napi
is also just a much cleaner and more consistent API to use than Nan + V8.Limitations
Unfortunately, we can't achieve true ABI-stability all in one fell swoop. This module has two binary interfaces that currently depend on Nan / V8 directly:
TSLanguage *
pointers out ofLanguage
objects from language modules. Currently, the Node.js binding code that thetree-sitter
CLI generates for individual languages are reliant onnan
and the V8 ABI 😞 .TextBuffer
objects provided bysuperstring
- For performance/concurrency reasons, this module can interact withTextBuffer
objects directly through native code, without calling back into JavaScript. This interface also relies onnan
/V8 😭 .The module won't truly have a stable ABI until these ☝️ things are fixed. I will have to update the Tree-sitter CLI to generate binding files that use
napi
. I probably won't attempt to convertsuperstring
to NAPI, but I can update theTextBufferSnapshotWrapper
class, which is the one shared between modules, to expose its data pointer in a way that doesn't depend on Nan.