punctuation_chars.h compilation errors on Windows (MSVC) + Clang

stsewd / tree-sitter-rst

reStructuredText grammar for tree-sitter

https://stsewd.dev/tree-sitter-rst/

MIT License

50 stars 7 forks source link

punctuation_chars.h compilation errors on Windows (MSVC) + Clang #55

Closed jmacmahon closed 5 months ago

jmacmahon commented 5 months ago

Hello,

I am setting up tree-sitter on a fresh installation today, and experiencing the following errors:

nvim-treesitter[rst]: Error during compilation
In file included from src/scanner.c:3:^M
In file included from src/tree_sitter_rst/scanner.c:6:^M
In file included from src/tree_sitter_rst/chars.c:2:^M
src/tree_sitter_rst/punctuation_chars.h:246:3: error: character too large for enclosing character literal type^M
  246 |   L'\U00010100',^M
      |   ^^M
... snip ...
src/tree_sitter_rst/punctuation_chars.h:308:5: error: character too large for enclosing character literal type^M
  308 |   { L'\U000110be', L'\U000110c1' },^M
      |     ^^M
fatal error: too many errors emitted, stopping now [-ferror-limit=]^M
20 errors generated.^M

I'm not too familiar with tree-sitter internals or C, but from some googling I think this L'\U00010100' is somehow not correct? Possibly this is related to the fix for #53.

Sorry I can't be of more help in diagnosing this - I'm happy to run any tests you might suggest.

I will try and see if the problem is reproducible on Linux + Clang.

jmacmahon commented 5 months ago

Update: everything works fine on Linux + Clang

jmacmahon commented 5 months ago

OK, after some digging I have found the following:

The L'...' syntax in C is a way of writing values of type wchar_t.
On Linux, sizeof(wchar_t) == 4, i.e. 32 bits, so literals greater than \U0000FFFF work fine.
On Windows, sizeof(wchar_t) == 2, i.e. 16 bits, so literals greater than \U0000FFFF cause compile errors.
In punctuation_chars.h, these character literals are assigned to int32_t, which ought to be big enough to hold the high-valued characters, but since they are written using the L'...' syntax, they must first go through wchar_t.

It seems to me that the best solution would be to write these characters as integer literals instead of wchar_t literals, if that makes sense?

stsewd commented 5 months ago

Hi @jmacmahon thanks for debugging the problem!

It seems to me that the best solution would be to write these characters as integer literals instead of wchar_t literals, if that makes sense?

That makes sense, feel free to send a PR if you want!

/cc @SilverRainZ in case you have some opinions about this.

jmacmahon commented 5 months ago

Hi,

I was working on a PR but got sidetracked trying to work out why the tests still passed on Windows despite the error - I assumed that there was no CI for Windows hence the error not being caught, but I see that in fact there is CI for Windows, but the tests nevertheless passed.

I was able to reproduce the unexpectedly-passing tests on my local Windows machine, but I couldn't work out why they were passing. In fact, I was able to completely delete the punctuation_chars.h file from my repo and run npm run test and have all the tests still pass.

So, I can do a PR with my suggested change, but actually I have no way of knowing if it would fix the problem since I can't get the tests to work right.

stsewd commented 5 months ago

@jmacmahon you may be missing running npm run build before running the tests.

jmacmahon commented 5 months ago

Unfortunately the tests still unexpectedly pass even if I do npm run build before npm run test - although you are right that deleting the entire file does cause these to fail as expected.

(As an aside, I found that npm run build modified src/parser.c file with some number differences, but I think this is unrelated?)

jmacmahon commented 5 months ago

Well, why the tests unexpectedly passed on Windows is beyond me, but I have submitted a PR anyway to fix the original problem.