ikatyang / tree-sitter-markdown

Markdown grammar for tree-sitter
https://ikatyang.github.io/tree-sitter-markdown
MIT License
183 stars 29 forks source link

Hard crash parsing certain markdown files #59

Open paul-gauthier opened 1 year ago

paul-gauthier commented 1 year ago

I have been using tree_sitter_languages to parse markdown. Some of my md files are causing a hard crash of the parser.parse() call:

Assertion failed: (i == length), function deserialize, file scanner.cc, line 79.
Abort trap: 6

I have isolated a sample which can trigger the crash. I binary searched the file to find a single offending line. Then, I gradually replaced all the characters with X until the crash went away.

Some notes:

I have filed this issue with both of these projects, as I am not sure which is most likely to be able to resolve it:


code = '''
XXXXXX_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXX X XXXX X XXXXX *X XXXXXXXX XXXXXXX X XXXX X XXXXX XXXXXXX_XXXXX XXXXXXXXXXX X XXXX X XXXXX XXXXXXX_XXXXXXXXX XXXXX X XXXX X XXXXX XX_XXXXX XXXX X XXXX X XXXXX XXXXXX XXXXX X XXXX X XXXXX XXXXXXXXXX XXXXXXXXX X XXXX X XXXXX XXXXXXXX_XX_XXXXXXX XXXX X XXXX X XXXXX XX_XXXXXXXXX XXXX X XXXX X XXXXX XXX_XXXXXXXXX XXXXXXXXXXXXXXX X XXXX X XXXXX XXXXXX_XXXXXXXX XXXXXXXXXXXXXXXX X XXXX X XXXXX XXXXXX XXXXXXXXX X XXXX X XXXXX XXXXX_XXXXXX XXXXXXXXXX X XXXX X XXXXX XXXXXXX XXXXXXXXXXXXXXXXXXXX X XXXXX XXXXXXX_XXXXXXX_XXXXXXXX_XXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXX_XXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXX X XXXX X XXXXX XXXX_XXXXX_XXX_XXXX_XXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXXXX XXXXXXXXXXXXXXXX X XXXX X XXXXX XXXXXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXX_XXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXX X XXXX X XXXXX XXXX_XXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXX_XXXXXXX_XXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXX_XXXXXX_XXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXXXX_XXXXX_XXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXXXX_XXXXX_XXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXXXX_XXXXXX XXXXXXXXXXXXX X XXXX X XXXXX XXXX_XXXXXXXX XXXXXXXXXXX X XXXX X XXXXX XXXXXXXXXX XXXXXXXXX X XXXX X XXXXX XXXXXXXXXX XXXXXXXXX X XXXX X XXXXX XXXXX_XXXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXX_XXXXXXXXXXX XXXXXXXXXXXXXXX X XXXX X XXXXX XXXXXXXXXXXX XXXXXXXXXXXXXXXX X XXXX X XXXXX XXXXXXX_XXXXX_XXX_XXXXX XXX X XXXX X XXXXX XXXXXX_XXXXXX_XXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXX_XXXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXX_XXXX_XX_XXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXXXX_XXXXXXX XXX X XXXXXXXXXXXX XXXX X XXXX X XXXXX XXXXXXXXX XXX X XXXXXXXXXXXX XXXX X XXXX X XXXXX XXXXXXXX XXX X XXX XXXXXX_XXXXXXXX XXXX X XXXXXX XXX_XXXX XXXXXXXXXXXX XXXXX X XXXX X XXXX
'''

import tree_sitter_languages

print(tree_sitter_languages.__version__) # 1.7.0

parser = tree_sitter_languages.get_parser('markdown')
parser.parse(bytes(code, "utf8"))
aguynamedben commented 1 year ago

The Emacs folks debugging this in https://github.com/emacs-tree-sitter/elisp-tree-sitter/issues/253 have 3-4 people reporting that it seems to be in Markdown files that have long/wide tables. Here's a screenshot of the file that crashes for me.

image

There are some other example Markdown files in that issue might be helpful for debugging this. I believe the root cause is in this library.

(btw, thank you for providing this grammar, it works great most of the time and I love it!) 🙏