stsewd / tree-sitter-rst

reStructuredText grammar for tree-sitter
https://stsewd.dev/tree-sitter-rst/
MIT License
50 stars 7 forks source link

bullet list/enumerated with `:` in paragraph #20

Open Carreau opened 2 years ago

Carreau commented 2 years ago

Apologies, I know I've been opening many issues recently, and again I want to thank you for writing this.

It appears the sphinx/docutils parses this:

Bullet list:
    * stuff : other
    * stuff : other
    * stuff : other

As bullet list, but tree-sitter-rst is unhappy and return an error node.

It does the same with enumerated lists, and it is sensitive to space before/after the :

As found in the wild in

https://github.com/scipy/scipy/blob/9a657fb04b90c6efc4a8115ab2c1a360ea380f45/scipy/sparse/linalg/isolve/iterative.py#L452-L455 Rendered here https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.gmres.html

Again I'm unsure how critical this is, and I just want to report difference with other rst parsers. It might be that the fix needs to be in scipy source.

On a related note I tried w/o the leading bullet points as well, and this errors as well, though it might invalid rst syntax, i'm unsure.

Not sure what:
    stuff : other
    stuff : other
    stuff : other
stsewd commented 2 years ago

Apologies, I know I've been opening many issues recently, and again I want to thank you for writing this.

Please keep them coming!

I think this is the same problem as https://github.com/stsewd/tree-sitter-rst/issues/16, the parser gets confused with the :. I think I'll probably need to move the parsing of : to the external scanner, aka c code.

stsewd commented 2 years ago

So, I'm trying to fix this, and is being a little complicated, since definition lists don't have a character that indicates the start of the list (like a bullet in a normal list), but I'm experimenting with some solutions.

But, if this is blocking you, you can comment this line https://github.com/stsewd/tree-sitter-rst/blob/5d1fb393538d604adb26d24d0990a141ff2bbc63/grammar.js#L272, and both of your issues should be fixed, but of course, you would lose the parsing of the classifiers, and instead they will be part of the term node.

Carreau commented 2 years ago

I'm not particularly blocked by this, I would actually be happy to try to update scipy.sparse.linalg.gmres to use a different syntax that is not ambiguous. I can try to convince the maintainers to do so at some points.

Carreau commented 2 years ago

Not sure if same thing but:

set_state :
    Context manager that sets the backend state.
get_state : <-- error if trailing space here.
    Gets a state to be set by this context manager.
stsewd commented 2 years ago

yeah, it's the same problem.

cpkio commented 2 years ago

Another addition to the same problem I suppose:

test (27:00 minute video) *review* test

test *review* test

Italics in the first line will not be detected. From the playground:

document [0, 0] - [4, 0]
  paragraph [0, 0] - [0, 39]
  paragraph [2, 0] - [2, 18]
    emphasis [2, 5] - [2, 13]
stsewd commented 10 months ago

@cpkio looks like your example was fixed with https://github.com/stsewd/tree-sitter-rst/pull/41. Added a regression test in https://github.com/stsewd/tree-sitter-rst/commit/3ba9eb9b5a47aadb1f2356a3cab0dd3d2bd00b4b.

The original problem reported in the issue still persists, sadly.