nvim-treesitter / nvim-treesitter

Nvim Treesitter configurations and abstraction layer
Apache License 2.0
10.84k stars 900 forks source link

Markdown + embedded perl blocks cause SEGV #5223

Closed leonerd closed 1 year ago

leonerd commented 1 year ago

Describe the bug

I'm not entirely sure where the bug lies, I'm currently debugging it. It seems to be a tricky three-way conflation of three different projects; any of which could be the source. But I'll start here as the toplevel container.

I have tree-sitter-markdown installed. This works fine for most .md files, including some code embeddings; such as embedded C code by using code fences.

```c
int x = 1234;
```

I have tree-sitter-perl installed (the one from https://github.com/tree-sitter-perl/tree-sitter-perl/; the subject of #5222). This is nicely stable and works fine on .pm files.

But the combination of the two causes an instant SEGV internally within a child process of nvim, causing the main container process to shut down with no apparent message to the terminal (I ran strace -f; and nothing was printed to stdout or stderr). There doesn't even need to be any actual code; simply starting with a code fence is enough to trigger it:

```perl

If I can work out how to extract it, I'll attach a gdb backtrace of the erroring process.

To Reproduce

  1. Install markdown and perl tree-sitter parsers.
  2. Open a new .md file (nvim new.md)
  3. Begin a perl code fence - triple-backticks, followed by perl
  4. Press Enter to begin a new line inside the code fence.

At this point nvim exits back to the shell, having printed nothing.

Expected behavior

nvim does not crash.

Output of :checkhealth nvim-treesitter

- WARNING tree-sitter executable not found (parser generator, only needed for :TSInstallFromGrammar, not      
…required for :TSInstall)
- OK node found v18.13.0 (only needed for :TSInstallFromGrammar)
- OK git executable found.
- OK cc executable found. Selected from { vim.NIL, "cc", "gcc", "clang", "cl", "zig" }
  Version: cc (Debian 13.1.0-9) 13.1.0
- OK Neovim was compiled with tree-sitter runtime ABI version 14 (required >=13). Parsers must be compatible
…with runtime ABI.

OS Info:
{
  machine = "x86_64",
  release = "6.3.0-1-amd64",
  sysname = "Linux",
  version = "#1 SMP PREEMPT_DYNAMIC Debian 6.3.7-1 (2023-06-12)"
}

Parser/Features         H L F I J
  - lua                 ✓ ✓ ✓ ✓ ✓
  - pod                 ✓ . . . .
  - c                   ✓ ✓ ✓ ✓ ✓
  - help                ✓ . . . ✓
  - perl                ✓ . ✓ . ✓
  - markdown            ✓ . ✓ . ✓
  - vim                 ✓ ✓ ✓ . ✓
  - query               ✓ ✓ ✓ ✓ ✓

  Legend: H[ighlight], L[ocals], F[olds], I[ndents], In[j]ections
         +) multiple parsers found, only one will be used
         x) errors found in the query, try to run :TSUpdate {lang}

Output of nvim --version

NVIM v0.9.0-dev-820+ga0a112515
Build type: RelWithDebInfo
LuaJIT 2.1.0-beta3
Compilation: /usr/bin/gcc-10  -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=1 -O2 -g  -Og -g -Wall -Wextra -pedantic -Wno-unused-parameter -Wstrict-prototypes -std=gnu99 -Wshadow -Wconversion -Wdouble-promotion -Wmissing-noreturn -Wmissing-format-attribute -Wmissing-prototypes -Wimplicit-fallthrough -Wvla -fno-common -fdiagnostics-color=always -fstack-protector-strong -DNVIM_UNIBI_HAS_VAR_FROM -DNVIM_MSGPACK_HAS_FLOAT32 -DNVIM_TS_HAS_SET_MATCH_LIMIT -DNVIM_TS_HAS_SET_ALLOCATOR -DINCLUDE_GENERATED_DECLARATIONS -D_GNU_SOURCE -DMIN_LOG_LEVEL=3 -I/home/runner/work/neovim/neovim/.deps/usr/include/luajit-2.1 -I/usr/include -I/home/runner/work/neovim/neovim/.deps/usr/include -I/home/runner/work/neovim/neovim/build/src/nvim/auto -I/home/runner/work/neovim/neovim/build/include -I/home/runner/work/neovim/neovim/build/cmake.config -I/home/runner/work/neovim/neovim/src
Compiled by runner@fv-az844-837

Additional context

No response

leonerd commented 1 year ago

Here's a gdb output:

Program received signal SIGSEGV, Segmentation fault.
ts_decode_utf8 (string=0x9 <error: Cannot access memory at address 0x9>, length=length@entry=4294967287, 
    code_point=code_point@entry=0x55b3b20a9ec0)
    at /home/runner/work/neovim/neovim/.deps/build/src/tree-sitter/lib/src/././unicode.h:32
32      /home/runner/work/neovim/neovim/.deps/build/src/tree-sitter/lib/src/././unicode.h: No such file or directory.
(gdb) bt
#0  ts_decode_utf8 (string=0x9 <error: Cannot access memory at address 0x9>, length=length@entry=4294967287, 
    code_point=code_point@entry=0x55b3b20a9ec0)
    at /home/runner/work/neovim/neovim/.deps/build/src/tree-sitter/lib/src/././unicode.h:32
#1  0x000055b3b1304577 in ts_lexer__get_lookahead (self=self@entry=0x55b3b20a9ec0)
    at /home/runner/work/neovim/neovim/.deps/build/src/tree-sitter/lib/src/./lexer.c:89
#2  0x000055b3b1305547 in ts_lexer__get_column (_self=0x55b3b20a9ec0)
    at /home/runner/work/neovim/neovim/.deps/build/src/tree-sitter/lib/src/./lexer.c:260
#3  0x00007f96cd81fca0 in tree_sitter_perl_external_scanner_scan () from /home/leo/.config/nvim/parser/perl.so
#4  0x000055b3b131efc2 in ts_parser__lex (parse_state=1, version=0, self=0x55b3b20a9ec0)
    at /home/runner/work/neovim/neovim/.deps/build/src/tree-sitter/lib/src/./parser.c:427
#5  ts_parser__advance (allow_node_reuse=<optimized out>, version=<optimized out>, self=0x55b3b20a9ec0)
    at /home/runner/work/neovim/neovim/.deps/build/src/tree-sitter/lib/src/./parser.c:1441
#6  ts_parser_parse (self=0x55b3b20a9ec0, old_tree=old_tree@entry=0x0, input=...)
    at /home/runner/work/neovim/neovim/.deps/build/src/tree-sitter/lib/src/./parser.c:1933
#7  0x000055b3b11acf1a in parser_parse (L=0x7f96cf0b4380)
    at /home/runner/work/neovim/neovim/src/nvim/lua/treesitter.c:421
#8  0x000055b3b1356126 in lj_BC_FUNCC ()
#9  0x000055b3b1342526 in lua_pcall (L=L@entry=0x7f96cf0b4380, nargs=nargs@entry=2, nresults=nresults@entry=1, 
    errfunc=errfunc@entry=-4) at lj_api.c:1116
#10 0x000055b3b119a589 in nlua_pcall (lstate=lstate@entry=0x7f96cf0b4380, nargs=nargs@entry=2, nresults=nresults@entry=1)
    at /home/runner/work/neovim/neovim/src/nvim/lua/executor.c:153
#11 0x000055b3b119fd01 in nlua_call_ref (ref=<optimized out>, name=<optimized out>, args=..., retval=<optimized out>, 
    err=0x7fffac99b910) at /home/runner/work/neovim/neovim/src/nvim/lua/executor.c:1559
#12 0x000055b3b10e29c3 in decor_provider_invoke (ns_id=1, name=name@entry=0x55b3b13bf185 "buf", ref=<optimized out>, 
    args=..., default_true=default_true@entry=true, perr=perr@entry=0x55b3b14f5958 <provider_err.lto_priv>)
    at /home/runner/work/neovim/neovim/src/nvim/decoration_provider.c:36
#13 0x000055b3b10e33ad in decor_providers_invoke_buf (buf=0x55b3b1e09550, providers=0x7fffac99ba40, 
    err=0x55b3b14f5958 <provider_err.lto_priv>) at /home/runner/work/neovim/neovim/src/nvim/decoration_provider.c:184
#14 0x000055b3b10f1498 in update_screen () at /home/runner/work/neovim/neovim/src/nvim/drawscreen.c:557
#15 0x000055b3b10f4db5 in ins_redraw (ready=ready@entry=true) at /home/runner/work/neovim/neovim/src/nvim/edit.c:1360
#16 0x000055b3b10f7a23 in insert_check (state=0x7fffac99bbf0) at /home/runner/work/neovim/neovim/src/nvim/edit.c:474
#17 0x000055b3b128577e in state_enter (s=0x7fffac99bbf0) at /home/runner/work/neovim/neovim/src/nvim/state.c:41
#18 0x000055b3b10f9081 in insert_enter (s=s@entry=0x7fffac99bbf0) at /home/runner/work/neovim/neovim/src/nvim/edit.c:337
#19 0x000055b3b10f9265 in edit (cmdchar=105, startln=<optimized out>, count=1)
    at /home/runner/work/neovim/neovim/src/nvim/edit.c:1267
#20 0x000055b3b11efbc2 in invoke_edit (cap=cap@entry=0x7fffac99bd90, repl=repl@entry=0, cmd=105, startln=startln@entry=0)
    at /home/runner/work/neovim/neovim/src/nvim/normal.c:6273
#21 0x000055b3b11efe8e in nv_edit (cap=0x7fffac99bd90) at /home/runner/work/neovim/neovim/src/nvim/normal.c:6250
#22 0x000055b3b11ecc17 in normal_execute (state=0x7fffac99bd10, key=<optimized out>)
    at /home/runner/work/neovim/neovim/src/nvim/normal.c:1202
#23 0x000055b3b12857ae in state_enter (s=0x7fffac99bd10) at /home/runner/work/neovim/neovim/src/nvim/state.c:99
#24 0x000055b3b11e541c in normal_enter (cmdwin=<optimized out>, noexmode=<optimized out>)
    at /home/runner/work/neovim/neovim/src/nvim/normal.c:500
#25 0x000055b3b107d01f in main (argc=<optimized out>, argv=<optimized out>)
    at /home/runner/work/neovim/neovim/src/nvim/main.c:625

which, er, now I stare at it points the finger sharply in our direction. Oops. I'll go open a bug there instead.

clason commented 1 year ago

You might be interested in our fuzzing action 😇

leonerd commented 1 year ago

Staring in more detail, while the segv comes from a backtrace that has tree-sitter-perl's scanner in it, it doesn't immediately look like that's to blame. Rather, my guess from the string/length arguments at the topmost frame (the one that actually crashed), it seems like some pointer value somewhere wasn't wired up properly in the glue, possibly somewhere around where the code injections happen. https://github.com/tree-sitter-perl/tree-sitter-perl/issues/107#issuecomment-1673897835

leonerd commented 1 year ago

Further observations: If I comment out just the (fenced_code_block) portion of queries/markdown/injections.scm this problem goes away. I no longer get code injections in the fenced blocks but at least it doesn't crash. So that's a potential workaround for now as well.

clason commented 1 year ago

Nope, that's like curing the gangrene by chopping off the arm.

leonerd commented 1 year ago

I notice that the original content already had

  (#not-match? @language "elm")

I wonder why that is. Maybe elm also crashes..? I shall add another (#not-match? @language "perl") to disable this while still allowing other languages, and observe which other languages do/don't crash. Maybe there is a pattern

leonerd commented 1 year ago

Langauges I have that embed just fine: pod, c, markdown, vim, query, lua

Of those, I know that pod does have a scanner.c and its scanner makes calls to lexer->get_column() just like the crashing perl one, but that one seems to behave just fine here. I seem unable to provoke it into crashing in the way that the perl one does so easily.

clason commented 1 year ago

Again, I recommend fuzzing your scanner. Fixing your crashes should be your first priority.

leonerd commented 1 year ago

@clason It's not directly the scanner. See again https://github.com/tree-sitter-perl/tree-sitter-perl/issues/107#issuecomment-1673897835

In more detail: We call lexer->get_column(lexer) on a valid pointer, and a few callstack layers down, the ts internals are attempting to call

ts_decode_utf8 (string=0x9 <error: Cannot access memory at address 0x9>, length=length@entry=4294967287, ...)

That pointer + length value look very suspect; as if someone was attemting a "start/length" calculation by doing pointer arithmetic on a NULL (i.e. zero) pointer rather than a valid one. That length value is (uint32_t)-9, by the way... which aligns with the 9 that the pointer is. Thinking further, the three backticks of the code fence, the four letters "perl" and the CRLF linefeed together are 9 bytes. I wonder if something somewhere hasn't set up the string offset for the injection quite right.

leonerd commented 1 year ago

And so it would seem: I can get a different number (0x1f == 31), if I add more content to the file. The extra text I added exactly accounted for that larger value.

amaanq commented 1 year ago

It is directly the scanner, fuzzing it revealed to be so

One issue is you're advancing past eof on L750, there's others I didn't dig into

image

rabbiveesh commented 1 year ago

You might be interested in our fuzzing action 😇

Could you point me in the direction of how to use this fuzzer? Happy to fix any issues that are my fault (which sounds like this is)

leonerd commented 1 year ago

One issue is you're advancing past eof on L750

Ahah; exciting. I had imagined tree-sitter would realise it was going past the end and not let me do that. It appears not :/

clason commented 1 year ago

Nope, the scanner is fully your own responsibility ;)

leonerd commented 1 year ago

Well, no luck yet.

I've fixed the L750 issue - https://github.com/tree-sitter-perl/tree-sitter-perl/commit/b9aac568bd482843a3dededb205760cf11ac1e8f

I've also attempted to have it abort() on any attempt to advance past EOF - https://github.com/tree-sitter-perl/tree-sitter-perl/commit/3c778f7f42427e05a739d442e2e11a9ca16ec736

Retesting it shows identical SEGV behaviour as before. I had expected to see some noise on stderr and a SIGABRT in these cases instead.

ObserverOfTime commented 1 year ago

The fuzzer is vigoux/tree-sitter-fuzz-action. You can download entrypoint.sh and run it locally.

clason commented 1 year ago

You might be interested in our fuzzing action 😇

Could you point me in the direction of how to use this fuzzer? Happy to fix any issues that are my fault (which sounds like this is)

https://github.com/neovim/tree-sitter-vim/blob/master/.github/workflows/fuzz.yml

rabbiveesh commented 1 year ago

I fixed the issue there with tree-sitter-perl/tree-sitter-perl#109, did not realize that lexer->get_column segfaults on EOF.

amaanq commented 1 year ago

Sorry for the delay in response - yeah the fuzzer is in an action, it's also in tree-sitter core in /script here, and in ts-questions here but that's meant to fuzz every grammar

If you still have issues I can always take a look

But the get_column segfault is interesting, I believe @ahlinc (paging) fixed that here: https://github.com/tree-sitter/tree-sitter/pull/2223

Also gonna throw in that having the perl parser upstream is welcome - it seems well maintained and good enough to be official :) (popular enough language, good grammar, etc)