tree-sitter / node-tree-sitter

Node.js bindings for tree-sitter
https://www.npmjs.com/package/tree-sitter
MIT License
663 stars 118 forks source link

Bug: `Parser.parse(string)` fails if the string length is `>=` than the buffer size #199

Closed kestred closed 3 months ago

kestred commented 6 months ago

In the README, it suggests you can pass a string to parse:

const sourcePath = '...';
const sourceCode =  readFileSync(sourcePath, 'utf-8');
const tree = parser.parse(source);

However if you pass a string longer than the default buffer size, parsing will fail with Error: Invalid Argument

const defaultBufferSize = 32 * 1024;
const sourceCode = ''.padStart(defaultBufferSize - 1, ' ');
const tree = parser.parse(source);  // --> OK

const defaultBufferSize = 32 * 1024;
const sourceCode = ''.padStart(defaultBufferSize, ' ');
const tree = parser.parse(source);  // --> ERROR
tree-sitter@0.21.1/node_modules/tree-sitter/index.js:361
    ? parse.call(
            ^

Error: Invalid argument
    at Parser.parse (tree-sitter@0.21.1/node_modules/tree-sitter/index.js:361:13)

Possible Solutions

Some ideas to improve the behavior:

Rannytheory commented 6 months ago

parser.parse lets you pass in a buffer size.

const defaultBufferSize = 32 * 1024;
const sourceCode = ''.padStart(defaultBufferSize, ' ');
const tree = parser.parse(source, undefined, { bufferSize: defaultBufferSize + 1 });  // --> Not an error anymore

Might be worth putting a note somewhere about this in the readme though.

Edit: thanks @connor4312

rien commented 5 months ago

We are encountering this bug as well (https://github.com/dodona-edu/dolos/issues/1544), it took a while to find the cause because of the error message. This seems to be a new problem (with the new NAPI?)

Maybe it is a good idea to improve the error message here by adding a manual check if the buffer size is enough? Or event prevent this from happening by increasing the buffer size if needed.

connor4312 commented 5 months ago

Hit this as well -- seems like the bufferSize needs to be input.length + 1. (Yes, length in UTF-16 code units, not bytes.) Is there any downside to always setting that to be the buffer size?

maxbrunsfeld commented 5 months ago

The buffer size should not affect the behavior - it should just control how much copying takes place. This is a really bad bug, it must have been introduced with the NAPI conversion.

segevfiner commented 4 months ago

Trying to debug