Hi stsewd, thank for your awesome rst parser!

I found this parser works no so well when parse documentation written in CJK. For example: :strong:`text`。 (trailing with a Chinese full stop 。, in Engish it is .) is a valid inline markup (OK for rst2pseudoxml), but can not be correctly recognize by tree-sitter-rst.

How to reproduce

$ echo ':strong:`text`。' > example.rst
$ rst2pseudoxml example.rst
<document source="example.rst">
    <paragraph>
        <strong>
            text
        。
$ tree-sitter p example.rst
(document [0, 0] - [1, 0]
  (ERROR [0, 0] - [0, 8]
    (role [0, 0] - [0, 8]))
  (paragraph [0, 8] - [0, 17]))
example.rst        0.03 ms         607 bytes/ms (ERROR [0, 0] - [0, 8])

How to fix

According to Inline markup recognition rules:

Inline markup start-strings must start a text block or be immediately preceded by

whitespace,

one of the ASCII characters - : / ' " < ( [ {

or a similar non-ASCII punctuation character. [18]

Inline markup end-strings must end a text block or be immediately followed by

whitespace,

one of the ASCII characters - . , : ; ! ? \ / ' " ) ] } >

or a similar non-ASCII punctuation character. [19]

I have make a PR(#10) for this, but it is not a good fix. Docutils provides some regex for matching these non-ASCII punctuation characters. According to my current understanding, matching them in src/tree_sitter_rst/chars.c::is_{start,end}_char should fix this issue.

stsewd / tree-sitter-rst

Scanner should recognize non-ASCII punctuation chars #53

How to reproduce

How to fix