tree-sitter / tree-sitter-javascript

Javascript grammar for tree-sitter
MIT License
363 stars 114 forks source link

Parser Hangs forever Parsing JS Code #322

Closed jpulec closed 5 months ago

jpulec commented 5 months ago

Problem

There appears to be some sequences of characters that can cause the parse function to hang indefinitely. Not sure whether or not this should also be reported in tree-sitter-javascript, but this code was previously parsing fine and now causes this issue in both node-tree-sitter and web-tree-sitter.

This behavior was not seen on web-tree-sitter 0.20.8 and appears to have been introduced starting with the 0.21.x versions.

Steps to reproduce

Create a file with the following contents and attempt to parse it, or just drop the following text string into the Treesitter playground.

{ try { } catch { (, e

Expected behavior

This should finish parsing as an ERROR.

Tree-sitter version (tree-sitter --version)

tree-sitter 0.22.6

Operating system/version

Arch Linux 6.8.9-arch1-2 x86_64

amaanq commented 5 months ago

I just pushed a fix that should solve this - pretty interesting case regardless and I believe it fixes our spurious js infinite loop hang bugs (seed 13295583532933362804 which hung before is fine now) - cc @maxbrunsfeld

The problem is the comment rule was wrapped in a choice, despite the inner choice being a token(choice(...)), but at runtime, the library is allowing aux_sym_comment_token1 (which this symbol only exists because of this erroneous choice wrapping btw) to be a candidate for missing token recoveries, despite it being part of an extra. So I think it's just wrong in general to allow this to happen, since we'd get an infinite loop of a missing non-terminal extra's terminal components being allowed to be recovered from, and it being allowed anywhere won't really affect/help out with reductions is my best guess.

I know this is a really in-depth/irrelevant explanation to your issue, but it's more me jotting my thoughts down and for Max to get some context on :sweat_smile: I'd like to avoid the potential for this to happen upstream in some way, either by unwrapping choices of a single item (easy, but the problem can arise w/o this) or by disregarding terminals that are a part of an non-terminal extra when it comes to recovering via missing symbols

wentinghome commented 4 months ago

I just pushed a fix that should solve this - pretty interesting case regardless and I believe it fixes our spurious js infinite loop hang bugs (seed 13295583532933362804 which hung before is fine now) - cc @maxbrunsfeld

The problem is the comment rule was wrapped in a choice, despite the inner choice being a token(choice(...)), but at runtime, the library is allowing aux_sym_comment_token1 (which this symbol only exists because of this erroneous choice wrapping btw) to be a candidate for missing token recoveries, despite it being part of an extra. So I think it's just wrong in general to allow this to happen, since we'd get an infinite loop of a missing non-terminal extra's terminal components being allowed to be recovered from, and it being allowed anywhere won't really affect/help out with reductions is my best guess.

I know this is a really in-depth/irrelevant explanation to your issue, but it's more me jotting my thoughts down and for Max to get some context on 😅 I'd like to avoid the potential for this to happen upstream in some way, either by unwrapping choices of a single item (easy, but the problem can arise w/o this) or by disregarding terminals that are a part of an non-terminal extra when it comes to recovering via missing symbols

@amaanq I run in to the same issue when using typescript to parse incomplete code. I wonder if there can be a fix for typescript, thank you!! Amazing work!