Open th-we opened 5 years ago
Actually, it turns out that while the U+FEFF is a hard to spot problem, it doesn't have to be something this obscure. I added another test case, which also shows the problem:
============================================
Crashing test
============================================
{}
---
(data)
I haven't had a chance yet to clone this locally and reproduce, but I think you're probably hitting https://github.com/tree-sitter/tree-sitter/issues/98. I just now remembered to change the title of that issue, since it was originally pretty obscure.
I noticed that there's what looks to be an empty string token in your grammar: https://github.com/th-we/tree-sitter-crash-demo/blob/master/grammar.js#L7. Is that intentional? The usual way to write that construct with Tree-sitter would be:
data: $ => $._quoted_value,
_quoted_value: $ => seq('"', optional($.value), '"'),
value: $ => /a[^"]*/
That way, you wouldn't get nodes of zero size.
While it's true that with optional()
the problem does not occur, the tree would change, i.e. there would be no value
node for the empty value (""
). I used to include the quotes in the value
nodes, but those nodes should be the injection point for another grammar.
I see the following workarounds:
<script>
tags in the JavaScript grammar.emtpy_value
node i.e. like
data: $ => choice($._quoted_value, $.empty_value),
_quoted_value: $ => seq('"', optional($.value), '"'),
value: $ => /a[^"]*/,
emtpy_value: $ => '""'
It's inconsistent to include the quotes in one case and exclude them in another, but in the grammar cson file it can at least be made sure that the quotes are always highlighted the same.
There might be more tree-sitterish ways to solve this.
And by the way: Many thanks for this work of yours. It gives me basic linting for an obscure language with a number of pitfalls, some of which I can catch using tree-sitter now.
When running
tree-sitter test
on this repo, the command will hang and eat more and more memory until it crashes.In the test case, there is a "ZERO WIDTH NO-BREAK SPACE" (U+FEFF) character before the test code. When removing it, the test passes.
I also tried with
tree-sitter-cli
compiled from the git master branch instead of from the version published on npmjs.com, but results were identical.