tree-sitter / node-tree-sitter

Node.js bindings for tree-sitter
https://www.npmjs.com/package/tree-sitter
MIT License
622 stars 107 forks source link

Parsing fails with large inputs #222

Open clentfort opened 3 weeks ago

clentfort commented 3 weeks ago

I'm trying to parse a file with more than 32767 (2^15-1) chars. This causes the parser to crash with the following error.

/Users/lentfortc/projects/szde-cdn/scripts/proxy-tool/node_modules/tree-sitter/index.js:361 ? parse.call( ^

Error: Invalid argument at Parser.parse (/Users/lentfortc/projects/szde-cdn/scripts/proxy-tool/node_modules/tree-sitter/index.js:361:13) at JavaScript (/Users/lentfortc/projects/szde-cdn/scripts/proxy-tool/src/index.ts:12:15) at Object. (/Users/lentfortc/projects/szde-cdn/scripts/proxy-tool/src/index.ts:12:23) at Module._compile (node:internal/modules/cjs/loader:1364:14) at Object.transformer (/Users/lentfortc/projects/szde-cdn/scripts/proxy-tool/node_modules/tsx/dist/register-C1urN2EO.cjs:2:1122) at Module.load (node:internal/modules/cjs/loader:1203:32) at Module._load (node:internal/modules/cjs/loader:1019:12) at ModuleWrap. (node:internal/modules/esm/translators:203:29) at ModuleJob.run (node:internal/modules/esm/module_job:195:25) at async ModuleLoader.import (node:internal/modules/esm/loader:337:24)

If I go down to 32767 chars the input is parsed without problems.

Node version: 18 and 20 node-tree-sitter version: 0.21.1

I tried this with two languages. tree-sitter-hcl and tree-sitter-javascript. Both times with an input of length 2^15-1 and 2^15. Both parsed the string of length 2^15-1 but not the string with 2^15 chars.

A project with the minimal reproduction can be found at https://github.com/clentfort/node-tree-sitter-max-input. It only includes the repo for the javascript grammar since the hcl grammar requires extra steps to work with node-tree-sitter

clentfort commented 3 weeks ago

After looking into the types I discovered that I can pass in a bufferSize as an option to parse. Setting it to input.length + 3 for any input length seems to solve this problem. The largest input I tried was input = '/'.repeat(Math.pow(2, 28)).

clentfort commented 3 weeks ago

I just noticed https://github.com/tree-sitter/node-tree-sitter/pull/214 being merged. This might solve this problem. I can't confirm this yet since it has not been released on npm yet.