Closed cpkio closed 3 years ago
Hi, thanks for the report! Sorry for the late reply. I'm able to replicate this, I'll try to find a fix.
I think this commit is not enough. Probably you should add || scanner->lookahead == CHAR_NBSP
to src\tree_sitter_rst\parser.c
at line 17, which fixes the issue for me.
I think this commit is not enough. Probably you should add
|| scanner->lookahead == CHAR_NBSP
tosrc\tree_sitter_rst\parser.c
at line 17, which fixes the issue for me.
Yeah, that probably needs to be updated. Can you provide the text that's failing to parse? So I can add it to the test cases.
Try this. As you can see, I use 0x2014
after 0xA0
.
- *Etiam* — sodales lacus at arcu dapibus pellentesque. Morbi consectetur dictum
ipsum, consectetur tempus ante venenatis et. Praesent maximus aliquam ex,
quis luctus ipsum pellentesque vel.
- Donec **sed arcu dolor** — In ex ligula, finibus id justo nec, luctus consectetur
sapien. Phasellus convallis rutrum nisl ac aliquet. Nulla mollis nec velit
placerat venenatis. Nunc sit amet arcu vel turpis tristique pellentesque at
nec nisl. Curabitur non facilisis ligula, nec semper mi.
- Donec vehicula lacus in ``ligula vestibulum`` — a fringilla ligula
elementum. Donec volutpat auctor nunc vitae fermentum. Suspendisse a felis
nec ligula porttitor tincidunt. Nam nec dapibus mauris.
@cpkio it makes sense to me to add that check there. But I wasn't able to reproduce with your example. Actually, github replaces the NBSP chars with spaces, so I'm not sure if I'm testing the same file.
Here is the file where a manually replaced the spaces with NBSP chars before the em-dash.
Could you share it as a file so github doesn't replace them?
This is what I get with the file you sent.
If you're using rst parser built from current master
(no src\tree_sitter_rst\parser.c
fix as I've suggested), that's strange. I've checked once again — made isolated Neovim, cloned my repo, installed rst parser with :TSInstall rst
and checked the playground. Still 0xA0
breaks the formatting. I'm on Neovim 0.5.1 release on Windows x64.
(Pay attention there are no rogue rst.*
in \start\nvim-treesitter\parser
folder, TS plugin seems to load any rst.*
it finds first).
I get the same CST when running outside neovim as well.
$ npm run parse -- nbsp-after-inline-markup.rst
(document [0, 0] - [12, 0]
(paragraph [0, 0] - [2, 35]
(emphasis [0, 0] - [0, 7]))
(paragraph [4, 0] - [7, 56]
(strong [4, 7] - [4, 25]))
(paragraph [9, 0] - [11, 55]
(literal [9, 24] - [9, 45])))
I may try booting into a virtual Windows machine, maybe a problem with the \r\n
line termination. But I do have set fileformat=dos
on my neovim when opening that file.
You could also try adding a test that fails around here
and run it with npm run build && npm run test
So, just for curiosity run the CI on windows https://github.com/stsewd/tree-sitter-rst/pull/15, looks like tests fail on windows for some reason.
OK, looks like the bug was related to \r\n line termination on Windows. Fixed this on https://github.com/stsewd/tree-sitter-rst/pull/15. Hopefully there aren't \r
chars without a \n
in the wild... This should improve your experience of the parser on windows :+1:
Thanks. Now fresh master
clone works fine if compiled by me.
Still, if the grammar is installed from Neovim with :TSInstall rst
, it is not working. But it is not your problem, obviously the nvim-treesitter
build script is wrong, though it is using the same toolchain as my manual builds. I'm building with gcc --std=c99 -o rst.so -shared src/parser.c src/scanner.c -Os -Isrc/
command.
@cpkio the lockfile needs to be updated in order to use the new version of the parser https://github.com/nvim-treesitter/nvim-treesitter/pull/1895
This resolves the issue.
Example
= This text should be parsed
0xA0
— correctly.= This text should
0xA0
be parsed correctly too.There a few spacing characters in Unicode, non-breaking space is one most commonly used to avoid breaking word a from dash or two words together.