stsewd / tree-sitter-rst

reStructuredText grammar for tree-sitter
https://stsewd.dev/tree-sitter-rst/
MIT License
50 stars 7 forks source link

Recognize non-ASCII punctuation chars #54

Closed SilverRainZ closed 5 months ago

SilverRainZ commented 5 months ago

The punctuation_chars.h header file is auto-generated from gen_punctuation_chars.py. I also add a test case "Unicode Punctuation Chars":

before:

  inline_markup:
    ✗ Unicode Punctuation Chars

1 failure:

correct / expected / unexpected

  1. Unicode Punctuation Chars:

    (document
      (paragraph)
      (paragraph)
      (paragraph)
      (paragraph))
      (paragraph
        (emphasis))
      (paragraph
        (emphasis)
        (strong))
      (paragraph
        (emphasis))
      (paragraph
        (emphasis)))

after:

  inline_markup:
    ✓ Unicode Punctuation Chars

Any comments are welcome.

Close #53.

SilverRainZ commented 5 months ago

@stsewd Can you please review it?

stsewd commented 5 months ago

@SilverRainZ thank you for opening this PR! I'll try to take a look at it this weekend or the next one (sorry, busy weeks). I just noticed that the Windows CI is failing with this change.

SilverRainZ commented 5 months ago

The Windows CI failed with a weird error message:

 scanner.c
D:\a\tree-sitter-rst\tree-sitter-rst\src\tree_sitter_rst\punctuation_chars.h(107,41): error C2059: syntax error: '}' [D:\a\tree-sitter-rst\tree-sitter-rst\build\tree_sitter_rst_binding.vcxproj]
  (compiling source file '../src/scanner.c')

I have checked L107 and there is nothing special, I have no idea for now.


// ...
  L'\u201f',
};
const int32_t start_chars_range[][2] = {}; // <-- L107

const int32_t delim_chars[] = {
// ...
SilverRainZ commented 5 months ago

It seems that we should update the WASM binary after any changes, but I think it should be done with the maintainer.

B.T.W, the npm run wasm is broken due to https://github.com/tree-sitter/tree-sitter/issues/3202.