stsewd / tree-sitter-rst

reStructuredText grammar for tree-sitter
https://stsewd.dev/tree-sitter-rst/
MIT License
50 stars 7 forks source link

parsing a document with a incomplete definition list #48

Open keewis opened 10 months ago

keewis commented 10 months ago

I'm trying to use tree-sitter-rst to parse numpydoc docstrings, which are based on rst but not a strict subset (see Carreau/velin#36).

While parsing, I noticed that this:

See Also
--------
item : description

Notes
-----
Some text.

will consume anything after the incomplete definition list item as part of a definition list:

(document (section (title)) (ERROR (classifier)))

where the definition list item consumes everything afterwards and dumps it into the classifier.

Instead, I would have expected a error node, but one that only contains the actual term and classifier, while everything else afterwards is parsed as usual (in other words, I'd like tree-sitter to prefer the insertion of a token over consuming more tokens in this case).

Do you think there is anything that can be changed in this library to get this to work (in other words, is this a bug, either in tree-sitter-rst or in upstream tree-sitter)? Or would you rather recommend a derived grammar that is specific to numpydoc (if that's possible)?

stsewd commented 10 months ago

Hi, docutils parses item : description as a paragraph. Does numpydoc also expects it to be paragraph? If so, this is probably the same issue as https://github.com/stsewd/tree-sitter-rst/issues/20.

keewis commented 10 months ago

~numpydoc makes use of docutils, so I guess it expects the same behavior (not sure though, I'm no expert on that code base).~ Edit: It appears that numpydoc is splitting the document (docstring) into sections and parses the content of these one by one. So no involvement of docutils or any other parsing library, just a bunch of regular expressions. This means that it also does not try to classify content as paragraphs or definition lists.

~So yes, this can very well be a duplicate of #20.~ This might still be a duplicate of #20, but I also think that tree-sitter-rst can be a bit stricter than docutils (which to me appears to be very forgiving).