tree-sitter / node-tree-sitter

Node.js bindings for tree-sitter
https://www.npmjs.com/package/tree-sitter
MIT License
624 stars 107 forks source link

Make the module context aware #184

Closed segevfiner closed 6 months ago

segevfiner commented 8 months ago

This makes this binding context aware, while still using v8/NaN/Node APIs directly, by moving all global data into an object, and removing global external buffers usage, which leads to seg faults/access violations on unloading the module.

The language bindings will also need to be made context aware which is done here https://github.com/tree-sitter/tree-sitter/pull/2841, and will need to be applied to each language binding.

Dropped supported for some ancient EOL Node.js and Electron versions so we don't have to ifdef for those.

Contributed on behalf of Swimm

verhovsky commented 8 months ago

can you change it to use Node-API as well #52 #129, so we have to update all the repositories just once?

segevfiner commented 8 months ago

The napi PRs are outdated, and didn't even build for me, so I did this on top of master, Either this or the napi PRs should be merged first, then I can go ahead and either redo this on top of napi or redo the napi work on top of this.

The napi PRs also still use NaN/v8 APIs which means they won't work as true napi until those are removed.

verhovsky commented 8 months ago

You could make a new branch that starts from segevfiner:context-aware and open a separate PR that would have N-API changes on top of these changes and we can merge them in two steps.

segevfiner commented 8 months ago

Well yeah. But I'm not sure from which napi branch or PR to start from, or if I should start from scratch, which is quite a bit of work. As the current napi branches/PRs are not up-to-date...?

And also what to do about the remaining issues in those branches, that is, v8 API usage.

verhovsky commented 8 months ago

129 is the most recent work and I think it doesn't use NAN. instance_of() in src/util.h and src/util.cc is a function that isn't actually used I think and the v8 stuff seems to be for backwards compatibility with old parsers, which makes sense.

If we merge this PR will the old compiled language parsers still work or is it completely backwards incompatible until we make that template change and re-compile?

segevfiner commented 8 months ago

The napi PR tries to maintain backwards compat by still using v8 API via the util file for grabbing the language. Which makes it backwards compatible but also not a true napi module, which means it won't really reap the benefit of ABI compatibility as long as that's there, it must only use napi for it to truly be a napi module. But I'm not sure if napi is capable of grabbing an internal pointer from a JS object made by v8, I think you must use a napi external, which means changing the language modules.

To try to keep them supporting both, the PR tries to add a napi external as an _language prop in the language modules, which I'm not sure where the code that changes the templates for that is. But that also makes them not a true napi module...

In contrast this PR is standalone and backwards compatible, it can be merged regardless of napi, or it can be redone on top of napi now or later, but to truly use it thread safe you do still need go apply the other change to the language modules, or they will fail to import in threads.

So what strategy you think we should take?

verhovsky commented 8 months ago

I checked that this is indeed backwards compatible (as in, still works with the old compiled languages) and doesn't make the bindings slower using mostly the instructions from the wiki article like this:

mkdir /tmp/tree-sitter-test
cd /tmp/tree-sitter-test
npm init -y
# add `"type": "module",` to package.json
npm install tree-sitter tree-sitter-cli tree-sitter-python
npx tree-sitter build-wasm node_modules/tree-sitter-python
git clone https://github.com/python/cpython.git
#cd cpython && checkout f508800 && cd ..
find cpython/Lib/ -type f -name "*.py" -exec cat {} + > sample_python_code.py

Then created main.js

import Parser from "tree-sitter";
import Python from "tree-sitter-python";
import fs from "fs";

const parser = new Parser();
parser.setLanguage(Python);

const file = fs.readFileSync("./sample_python_code.py", "utf8");

const start = process.hrtime.bigint();
parser.parse(file);
const end = process.hrtime.bigint();
console.log(`node-tree-sitter parse time: ${(end - start) / 1000000n}ms`);

ran it, then changed it to use this PR like this

cd /tmp
git clone https://github.com/segevfiner/node-tree-sitter.git
cd node-tree-sitter
git checkout 779e0d68a8590f8cf00dc413f715234cd5f5b7a8
npm install
npm run build

Then back in the original tree-sitter-test/ directory make this edit to package.json

14c14
<     "tree-sitter": "^0.20.6",
---
>     "tree-sitter": "file://tmp/node-tree-sitter",

then

rm -rf node_modules
npm install

and run main.js again and got the same performance (4 seconds)

I also tried doing

14c14
<     "tree-sitter": "^0.20.6",
---
>     "tree-sitter": "git://github.com/segevfiner/node-tree-sitter.git#779e0d68a8590f8cf00dc413f715234cd5f5b7a8",

but I get this build error when running npm install

/tmp/tree-sitter-vs-node-tree-sitter$ npm install
npm ERR! code 1
npm ERR! git dep preparation failed
npm ERR! command /opt/homebrew/Cellar/node/21.6.0/bin/node /opt/homebrew/lib/node_modules/npm/bin/npm-cli.js install --force --cache=/Users/space/.npm --prefer-offline=false --prefer-online=false --offline=false --no-progress --no-save --no-audit --include=dev --include=peer --include=optional --no-package-lock-only --no-dry-run
npm ERR! > tree-sitter@0.20.6 install
npm ERR! > prebuild-install || node-gyp rebuild
npm ERR! npm WARN using --force Recommended protections disabled.
npm ERR! npm WARN deprecated @npmcli/move-file@2.0.1: This functionality has been moved to @npmcli/fs
npm ERR! npm WARN deprecated source-map-url@0.4.1: See https://github.com/lydell/source-map-url#deprecated
npm ERR! npm WARN deprecated request-promise-native@1.0.9: request-promise-native has been deprecated because it extends the now deprecated request package, see https://github.com/request/request/issues/3142
npm ERR! npm WARN deprecated urix@0.1.0: Please see https://github.com/lydell/urix#deprecated
npm ERR! npm WARN deprecated har-validator@5.1.5: this library is no longer supported
npm ERR! npm WARN deprecated abab@2.0.6: Use your platform's native atob() and btoa() methods instead
npm ERR! npm WARN deprecated domexception@1.0.1: Use your platform's native DOMException instead
npm ERR! npm WARN deprecated resolve-url@0.2.1: https://github.com/lydell/resolve-url#deprecated
npm ERR! npm WARN deprecated source-map-resolve@0.5.3: See https://github.com/lydell/source-map-resolve#deprecated
npm ERR! npm WARN deprecated left-pad@1.3.0: use String.prototype.padStart()
npm ERR! npm WARN deprecated w3c-hr-time@1.0.2: Use your platform's native performance.now() and performance.timeOrigin.
npm ERR! npm WARN deprecated sane@4.1.0: some dependency vulnerabilities fixed, support for node < 10 dropped, and newer ECMAScript syntax/features added
npm ERR! npm WARN deprecated fsevents@1.2.13: The v1 package contains DANGEROUS / INSECURE binaries. Upgrade to safe fsevents v2
npm ERR! npm WARN deprecated uuid@3.4.0: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm ERR! npm WARN deprecated request@2.88.2: request has been deprecated, see https://github.com/request/request/issues/3142
npm ERR! npm WARN deprecated tar@2.2.2: This version of tar is no longer supported, and will not receive security updates. Please upgrade asap.
npm ERR! npm WARN deprecated tar@2.2.2: This version of tar is no longer supported, and will not receive security updates. Please upgrade asap.
npm ERR! prebuild-install warn install No prebuilt binaries found (target=21.6.0 runtime=node arch=arm64 libc= platform=darwin)
npm ERR! gyp info it worked if it ends with ok
npm ERR! gyp info using node-gyp@10.0.1
npm ERR! gyp info using node@21.6.0 | darwin | arm64
npm ERR! gyp info find Python using Python version 3.11.7 found at "/opt/homebrew/opt/python@3.11/bin/python3.11"
npm ERR! gyp info spawn /opt/homebrew/opt/python@3.11/bin/python3.11
npm ERR! gyp info spawn args [
npm ERR! gyp info spawn args '/Users/space/.npm/_cacache/tmp/git-cloneASf9UUJOmdtt/node_modules/node-gyp/gyp/gyp_main.py',
npm ERR! gyp info spawn args 'binding.gyp',
npm ERR! gyp info spawn args '-f',
npm ERR! gyp info spawn args 'make',
npm ERR! gyp info spawn args '-I',
npm ERR! gyp info spawn args '/Users/space/.npm/_cacache/tmp/git-cloneASf9UUJOmdtt/build/config.gypi',
npm ERR! gyp info spawn args '-I',
npm ERR! gyp info spawn args '/Users/space/.npm/_cacache/tmp/git-cloneASf9UUJOmdtt/node_modules/node-gyp/addon.gypi',
npm ERR! gyp info spawn args '-I',
npm ERR! gyp info spawn args '/Users/space/Library/Caches/node-gyp/21.6.0/include/node/common.gypi',
npm ERR! gyp info spawn args '-Dlibrary=shared_library',
npm ERR! gyp info spawn args '-Dvisibility=default',
npm ERR! gyp info spawn args '-Dnode_root_dir=/Users/space/Library/Caches/node-gyp/21.6.0',
npm ERR! gyp info spawn args '-Dnode_gyp_dir=/Users/space/.npm/_cacache/tmp/git-cloneASf9UUJOmdtt/node_modules/node-gyp',
npm ERR! gyp info spawn args '-Dnode_lib_file=/Users/space/Library/Caches/node-gyp/21.6.0/<(target_arch)/node.lib',
npm ERR! gyp info spawn args '-Dmodule_root_dir=/Users/space/.npm/_cacache/tmp/git-cloneASf9UUJOmdtt',
npm ERR! gyp info spawn args '-Dnode_engine=v8',
npm ERR! gyp info spawn args '--depth=.',
npm ERR! gyp info spawn args '--no-parallel',
npm ERR! gyp info spawn args '--generator-output',
npm ERR! gyp info spawn args 'build',
npm ERR! gyp info spawn args '-Goutput_dir=.'
npm ERR! gyp info spawn args ]
npm ERR! gyp info spawn make
npm ERR! gyp info spawn args [ 'BUILDTYPE=Release', '-C', 'build' ]
npm ERR! make: *** No rule to make target `Release/obj.target/tree_sitter/vendor/tree-sitter/lib/src/lib.o', needed by `Release/tree_sitter.a'.  Stop.
npm ERR! gyp ERR! build error 
npm ERR! gyp ERR! stack Error: `make` failed with exit code: 2
npm ERR! gyp ERR! stack at ChildProcess.<anonymous> (/Users/space/.npm/_cacache/tmp/git-cloneASf9UUJOmdtt/node_modules/node-gyp/lib/build.js:209:23)
npm ERR! gyp ERR! System Darwin 23.2.0
npm ERR! gyp ERR! command "/opt/homebrew/Cellar/node/21.6.0/bin/node" "/Users/space/.npm/_cacache/tmp/git-cloneASf9UUJOmdtt/node_modules/.bin/node-gyp" "rebuild"
npm ERR! gyp ERR! cwd /Users/space/.npm/_cacache/tmp/git-cloneASf9UUJOmdtt
npm ERR! gyp ERR! node -v v21.6.0
npm ERR! gyp ERR! node-gyp -v v10.0.1
npm ERR! gyp ERR! not ok 
npm ERR! npm ERR! code 1
npm ERR! npm ERR! path /Users/space/.npm/_cacache/tmp/git-cloneASf9UUJOmdtt
npm ERR! npm ERR! command failed
npm ERR! npm ERR! command sh -c prebuild-install || node-gyp rebuild
npm ERR! 
npm ERR! npm ERR! A complete log of this run can be found in: /Users/space/.npm/_logs/2024-01-19T19_13_04_259Z-debug-0.log

npm ERR! A complete log of this run can be found in: /Users/space/.npm/_logs/2024-01-19T19_13_03_372Z-debug-0.log

The only difference I see is nodegyp build (working) and node-gyp rebuild (not working) but I don't know why that would cause that make error...

So anyway, since we don't have to recompile all the languages we just have to do that if we want to take advantage of these changes, I think we're good to merge this then unless @maxbrunsfeld you have any objections?

segevfiner commented 7 months ago

@verhovsky Might be an issue with Git dependencies in npm not cloning the submodule?

segevfiner commented 7 months ago

Just to be clear, I'm quite willing to help also do this for napi, help get napi merging, and add other improvements that I have to the Node.js grammars generation, etc. But there needs to be some sign of life in that the project is merging PRs first, or the work will just go to waste going stale, like what happened to the current napi PR... I haven't seen @maxbrunsfeld responding for quite a while to tree-sitter related stuff, hope he didn't tire our from the project entirely...

amaanq commented 7 months ago

Hey, I've been MIA a while as a maintainer - I'm back now though and I'd love to collaborate on that. I was planning to after #163, but we can definitely just sidestep that and move straight to using napi. That's quite a breaking issue though for every grammar, so I'd like to coordinate when/how to do that. Are you on Matrix or Discord? @verhovsky I'm sure you'd like to join in as well, feel free to

Currently working out some kinks in core and I plan to cut a release (0.20.9) very soon.

segevfiner commented 7 months ago

If you decide to migrate to napi first, if and when you do have an up-to-date napi branch or have merged it, let me know, and I can try to redo this change on top of it.

verhovsky commented 7 months ago

@segevfiner can you rebase onto master and i'll merge it

segevfiner commented 6 months ago

@verhovsky Rebased

segevfiner commented 6 months ago

We will also need tree-sitter/tree-sitter#2841 to make the actual languages context aware too.

Now, I can try and redo or merge into the napi work, but we do need a decision about how to handle the grammars in napi, since to make this a true napi module we have to not use v8/NaN APIs which means we will also need to convert the languages to napi at the same time, which is obviously a breaking change as they will only work with a napi version of the module.

maxbrunsfeld commented 6 months ago

Thanks @segevfiner, nice work.