Closed leonerd closed 1 year ago
Here's a gdb output:
Program received signal SIGSEGV, Segmentation fault.
ts_decode_utf8 (string=0x9 <error: Cannot access memory at address 0x9>, length=length@entry=4294967287,
code_point=code_point@entry=0x55b3b20a9ec0)
at /home/runner/work/neovim/neovim/.deps/build/src/tree-sitter/lib/src/././unicode.h:32
32 /home/runner/work/neovim/neovim/.deps/build/src/tree-sitter/lib/src/././unicode.h: No such file or directory.
(gdb) bt
#0 ts_decode_utf8 (string=0x9 <error: Cannot access memory at address 0x9>, length=length@entry=4294967287,
code_point=code_point@entry=0x55b3b20a9ec0)
at /home/runner/work/neovim/neovim/.deps/build/src/tree-sitter/lib/src/././unicode.h:32
#1 0x000055b3b1304577 in ts_lexer__get_lookahead (self=self@entry=0x55b3b20a9ec0)
at /home/runner/work/neovim/neovim/.deps/build/src/tree-sitter/lib/src/./lexer.c:89
#2 0x000055b3b1305547 in ts_lexer__get_column (_self=0x55b3b20a9ec0)
at /home/runner/work/neovim/neovim/.deps/build/src/tree-sitter/lib/src/./lexer.c:260
#3 0x00007f96cd81fca0 in tree_sitter_perl_external_scanner_scan () from /home/leo/.config/nvim/parser/perl.so
#4 0x000055b3b131efc2 in ts_parser__lex (parse_state=1, version=0, self=0x55b3b20a9ec0)
at /home/runner/work/neovim/neovim/.deps/build/src/tree-sitter/lib/src/./parser.c:427
#5 ts_parser__advance (allow_node_reuse=<optimized out>, version=<optimized out>, self=0x55b3b20a9ec0)
at /home/runner/work/neovim/neovim/.deps/build/src/tree-sitter/lib/src/./parser.c:1441
#6 ts_parser_parse (self=0x55b3b20a9ec0, old_tree=old_tree@entry=0x0, input=...)
at /home/runner/work/neovim/neovim/.deps/build/src/tree-sitter/lib/src/./parser.c:1933
#7 0x000055b3b11acf1a in parser_parse (L=0x7f96cf0b4380)
at /home/runner/work/neovim/neovim/src/nvim/lua/treesitter.c:421
#8 0x000055b3b1356126 in lj_BC_FUNCC ()
#9 0x000055b3b1342526 in lua_pcall (L=L@entry=0x7f96cf0b4380, nargs=nargs@entry=2, nresults=nresults@entry=1,
errfunc=errfunc@entry=-4) at lj_api.c:1116
#10 0x000055b3b119a589 in nlua_pcall (lstate=lstate@entry=0x7f96cf0b4380, nargs=nargs@entry=2, nresults=nresults@entry=1)
at /home/runner/work/neovim/neovim/src/nvim/lua/executor.c:153
#11 0x000055b3b119fd01 in nlua_call_ref (ref=<optimized out>, name=<optimized out>, args=..., retval=<optimized out>,
err=0x7fffac99b910) at /home/runner/work/neovim/neovim/src/nvim/lua/executor.c:1559
#12 0x000055b3b10e29c3 in decor_provider_invoke (ns_id=1, name=name@entry=0x55b3b13bf185 "buf", ref=<optimized out>,
args=..., default_true=default_true@entry=true, perr=perr@entry=0x55b3b14f5958 <provider_err.lto_priv>)
at /home/runner/work/neovim/neovim/src/nvim/decoration_provider.c:36
#13 0x000055b3b10e33ad in decor_providers_invoke_buf (buf=0x55b3b1e09550, providers=0x7fffac99ba40,
err=0x55b3b14f5958 <provider_err.lto_priv>) at /home/runner/work/neovim/neovim/src/nvim/decoration_provider.c:184
#14 0x000055b3b10f1498 in update_screen () at /home/runner/work/neovim/neovim/src/nvim/drawscreen.c:557
#15 0x000055b3b10f4db5 in ins_redraw (ready=ready@entry=true) at /home/runner/work/neovim/neovim/src/nvim/edit.c:1360
#16 0x000055b3b10f7a23 in insert_check (state=0x7fffac99bbf0) at /home/runner/work/neovim/neovim/src/nvim/edit.c:474
#17 0x000055b3b128577e in state_enter (s=0x7fffac99bbf0) at /home/runner/work/neovim/neovim/src/nvim/state.c:41
#18 0x000055b3b10f9081 in insert_enter (s=s@entry=0x7fffac99bbf0) at /home/runner/work/neovim/neovim/src/nvim/edit.c:337
#19 0x000055b3b10f9265 in edit (cmdchar=105, startln=<optimized out>, count=1)
at /home/runner/work/neovim/neovim/src/nvim/edit.c:1267
#20 0x000055b3b11efbc2 in invoke_edit (cap=cap@entry=0x7fffac99bd90, repl=repl@entry=0, cmd=105, startln=startln@entry=0)
at /home/runner/work/neovim/neovim/src/nvim/normal.c:6273
#21 0x000055b3b11efe8e in nv_edit (cap=0x7fffac99bd90) at /home/runner/work/neovim/neovim/src/nvim/normal.c:6250
#22 0x000055b3b11ecc17 in normal_execute (state=0x7fffac99bd10, key=<optimized out>)
at /home/runner/work/neovim/neovim/src/nvim/normal.c:1202
#23 0x000055b3b12857ae in state_enter (s=0x7fffac99bd10) at /home/runner/work/neovim/neovim/src/nvim/state.c:99
#24 0x000055b3b11e541c in normal_enter (cmdwin=<optimized out>, noexmode=<optimized out>)
at /home/runner/work/neovim/neovim/src/nvim/normal.c:500
#25 0x000055b3b107d01f in main (argc=<optimized out>, argv=<optimized out>)
at /home/runner/work/neovim/neovim/src/nvim/main.c:625
which, er, now I stare at it points the finger sharply in our direction. Oops. I'll go open a bug there instead.
You might be interested in our fuzzing action 😇
Staring in more detail, while the segv comes from a backtrace that has tree-sitter-perl's scanner in it, it doesn't immediately look like that's to blame. Rather, my guess from the string/length arguments at the topmost frame (the one that actually crashed), it seems like some pointer value somewhere wasn't wired up properly in the glue, possibly somewhere around where the code injections happen. https://github.com/tree-sitter-perl/tree-sitter-perl/issues/107#issuecomment-1673897835
Further observations: If I comment out just the (fenced_code_block)
portion of queries/markdown/injections.scm
this problem goes away. I no longer get code injections in the fenced blocks but at least it doesn't crash. So that's a potential workaround for now as well.
Nope, that's like curing the gangrene by chopping off the arm.
I notice that the original content already had
(#not-match? @language "elm")
I wonder why that is. Maybe elm also crashes..? I shall add another (#not-match? @language "perl")
to disable this while still allowing other languages, and observe which other languages do/don't crash. Maybe there is a pattern
Langauges I have that embed just fine: pod, c, markdown, vim, query, lua
Of those, I know that pod does have a scanner.c
and its scanner makes calls to lexer->get_column()
just like the crashing perl one, but that one seems to behave just fine here. I seem unable to provoke it into crashing in the way that the perl one does so easily.
Again, I recommend fuzzing your scanner. Fixing your crashes should be your first priority.
@clason It's not directly the scanner. See again https://github.com/tree-sitter-perl/tree-sitter-perl/issues/107#issuecomment-1673897835
In more detail: We call lexer->get_column(lexer)
on a valid pointer, and a few callstack layers down, the ts internals are attempting to call
ts_decode_utf8 (string=0x9 <error: Cannot access memory at address 0x9>, length=length@entry=4294967287, ...)
That pointer + length value look very suspect; as if someone was attemting a "start/length" calculation by doing pointer arithmetic on a NULL (i.e. zero) pointer rather than a valid one. That length value is (uint32_t)-9
, by the way... which aligns with the 9 that the pointer is. Thinking further, the three backticks of the code fence, the four letters "perl" and the CRLF linefeed together are 9 bytes. I wonder if something somewhere hasn't set up the string offset for the injection quite right.
And so it would seem: I can get a different number (0x1f == 31), if I add more content to the file. The extra text I added exactly accounted for that larger value.
It is directly the scanner, fuzzing it revealed to be so
One issue is you're advancing past eof on L750, there's others I didn't dig into
You might be interested in our fuzzing action 😇
Could you point me in the direction of how to use this fuzzer? Happy to fix any issues that are my fault (which sounds like this is)
One issue is you're advancing past eof on L750
Ahah; exciting. I had imagined tree-sitter would realise it was going past the end and not let me do that. It appears not :/
Nope, the scanner is fully your own responsibility ;)
Well, no luck yet.
I've fixed the L750 issue - https://github.com/tree-sitter-perl/tree-sitter-perl/commit/b9aac568bd482843a3dededb205760cf11ac1e8f
I've also attempted to have it abort() on any attempt to advance past EOF - https://github.com/tree-sitter-perl/tree-sitter-perl/commit/3c778f7f42427e05a739d442e2e11a9ca16ec736
Retesting it shows identical SEGV behaviour as before. I had expected to see some noise on stderr and a SIGABRT in these cases instead.
The fuzzer is vigoux/tree-sitter-fuzz-action. You can download entrypoint.sh
and run it locally.
You might be interested in our fuzzing action 😇
Could you point me in the direction of how to use this fuzzer? Happy to fix any issues that are my fault (which sounds like this is)
https://github.com/neovim/tree-sitter-vim/blob/master/.github/workflows/fuzz.yml
I fixed the issue there with tree-sitter-perl/tree-sitter-perl#109, did not realize that lexer->get_column
segfaults on EOF.
Sorry for the delay in response - yeah the fuzzer is in an action, it's also in tree-sitter core in /script here, and in ts-questions here but that's meant to fuzz every grammar
If you still have issues I can always take a look
But the get_column segfault is interesting, I believe @ahlinc (paging) fixed that here: https://github.com/tree-sitter/tree-sitter/pull/2223
Also gonna throw in that having the perl parser upstream is welcome - it seems well maintained and good enough to be official :) (popular enough language, good grammar, etc)
Describe the bug
I'm not entirely sure where the bug lies, I'm currently debugging it. It seems to be a tricky three-way conflation of three different projects; any of which could be the source. But I'll start here as the toplevel container.
I have
tree-sitter-markdown
installed. This works fine for most .md files, including some code embeddings; such as embedded C code by using code fences.I have
tree-sitter-perl
installed (the one from https://github.com/tree-sitter-perl/tree-sitter-perl/; the subject of #5222). This is nicely stable and works fine on .pm files.But the combination of the two causes an instant SEGV internally within a child process of nvim, causing the main container process to shut down with no apparent message to the terminal (I ran
strace -f
; and nothing was printed to stdout or stderr). There doesn't even need to be any actual code; simply starting with a code fence is enough to trigger it:If I can work out how to extract it, I'll attach a gdb backtrace of the erroring process.
To Reproduce
nvim new.md
)perl
At this point nvim exits back to the shell, having printed nothing.
Expected behavior
nvim does not crash.
Output of
:checkhealth nvim-treesitter
Output of
nvim --version
Additional context
No response