tree-sitter / tree-sitter-cli

CLI tool for creating and testing tree-sitter parsers
MIT License
45 stars 15 forks source link

tree-sitter crashes because of U+FEFF in test file #49

Open th-we opened 5 years ago

th-we commented 5 years ago

When running tree-sitter test on this repo, the command will hang and eat more and more memory until it crashes.

In the test case, there is a "ZERO WIDTH NO-BREAK SPACE" (U+FEFF) character before the test code. When removing it, the test passes.

I also tried with tree-sitter-cli compiled from the git master branch instead of from the version published on npmjs.com, but results were identical.

th-we commented 5 years ago

Actually, it turns out that while the U+FEFF is a hard to spot problem, it doesn't have to be something this obscure. I added another test case, which also shows the problem:

============================================
Crashing test
============================================

{}

---

(data)
maxbrunsfeld commented 5 years ago

I haven't had a chance yet to clone this locally and reproduce, but I think you're probably hitting https://github.com/tree-sitter/tree-sitter/issues/98. I just now remembered to change the title of that issue, since it was originally pretty obscure.

I noticed that there's what looks to be an empty string token in your grammar: https://github.com/th-we/tree-sitter-crash-demo/blob/master/grammar.js#L7. Is that intentional? The usual way to write that construct with Tree-sitter would be:

data: $ => $._quoted_value,
_quoted_value: $ => seq('"', optional($.value), '"'),
value: $ => /a[^"]*/

That way, you wouldn't get nodes of zero size.

th-we commented 5 years ago

While it's true that with optional() the problem does not occur, the tree would change, i.e. there would be no value node for the empty value (""). I used to include the quotes in the value nodes, but those nodes should be the injection point for another grammar.

I see the following workarounds:

There might be more tree-sitterish ways to solve this.

And by the way: Many thanks for this work of yours. It gives me basic linting for an obscure language with a number of pitfalls, some of which I can catch using tree-sitter now.