tree-sitter / tree-sitter-julia

Julia grammar for Tree-sitter
MIT License
93 stars 31 forks source link

Crash when adding `$` at end of string #128

Closed gdkrmr closed 2 weeks ago

gdkrmr commented 7 months ago

I have a .jl File that reliably crashes my emacs with julia-ts-mode and helix. I strongly suspect that tree sitter is to blame here, if I use julia-mode (the one without tree sitter), it works fine. The file is [1], to reproduce the crash, go to line 27 and insert $ at the end of "euler", trying to write "euler$i", the crash happens when $ gets inserted.

change

    @eval export $(Symbol("euler" * string(i)))

to

    @eval export $(Symbol("euler$" * string(i)))

I wasn't able to reproduce this in any other file, so I am not sure what exactly causes the crash.

[1] https://github.com/gdkrmr/Euler.jl/blob/crashes_treesitter/src/Euler.jl

savq commented 7 months ago

Yep. I was also able to reproduce the issue in neovim.

I wasn't able to reproduce this in any other file, so I am not sure what exactly causes the crash.

The file is quite big, so I'd assume that treesitter takes too much memory when updating and the whole process gets killed.

ArbitRandomUser commented 4 months ago

when the file is truncated to only 758 lines ( removed all the "empty" functions in the file) i see that the parsing fails fig

savq commented 4 months ago

Yes. parsing fails because "euler$" is not a valid string literal in Julia. The problem here is that tree-sitter's error recovery isn't activated.

maxbrunsfeld commented 4 months ago

The crash is happening because tree-sitter-julia's external scanner is not checking the size of the data that it's writing during serialization:

https://github.com/tree-sitter/tree-sitter-julia/blob/acd5ca12cc278df7960629c2429a096c7ac4bbea/src/scanner.c#L59-L64

It needs to check that the size doesn't exceed TREE_SITTER_SERIALIZATION_BUFFER_SIZE.

maxbrunsfeld commented 4 months ago

I opened a Tree-sitter PR to make this kind of error more obvious, and to protect against it when possible (when the grammar is loaded via wasm).

https://github.com/tree-sitter/tree-sitter/pull/3318

savq commented 4 months ago

I see that the check is missing, but that doesn't seem to solve the underlying issue.

When the scanner sees the $, it'll return a STRING_CONTENT token and return to the parser. Then the parser will fail to match string_interpolation. So far so good.

What happens after that is less clear. If I understand correctly, the parser doesn't return control to the scanner to match _string_end, and now all the subsequent strings will fail because the stack was not cleared properly.

Is that what's causing the overflow?

If so, I didn't consider this type of delimiter mismatch when writing the scanner, so I'm open to suggestions on how to improve this.

ArbitRandomUser commented 3 months ago

can something be done as a temporary fix so that neovim doesnt crash ?

ArbitRandomUser commented 3 months ago

its not just the $ that triggers this problem , a " in the buffer before a triple quoted string also triggers this.

for example the following ,

a = " 
function f()
    1
end
"""
hello world
"""

and on large files it crashes

ArbitRandomUser commented 3 months ago

is simply memcopying only upto TREE_SITTER_SERIALIZATION_BUFFER_SIZE if size exceeds it a workable fix for it not crashing ? the parsing failing still happens , but atleast a temporary fix to prevent editor crashes .

this is what i tried ...

diff --git a/src/scanner.c b/src/scanner.c
index f81d6f4..84164a9 100644
--- a/src/scanner.c
+++ b/src/scanner.c
@@ -59,8 +59,14 @@ void tree_sitter_julia_external_scanner_destroy(void *payload) {
 unsigned tree_sitter_julia_external_scanner_serialize(void *payload, char *buffer) {
     Stack *stack = payload;
     unsigned size = stack->size;
-    memcpy(buffer, stack->contents, size);
-    return size;
+    if (size > TREE_SITTER_SERIALIZATION_BUFFER_SIZE){
+        memcpy(buffer, stack->contents,TREE_SITTER_SERIALIZATION_BUFFER_SIZE);
+        return TREE_SITTER_SERIALIZATION_BUFFER_SIZE;
+    }
+    else{
+        memcpy(buffer, stack->contents, size);
+        return size;
+    }
 }

 void tree_sitter_julia_external_scanner_deserialize(void *payload, const char *buffer, unsigned size) {
savq commented 2 weeks ago

I thought I'd already pushed a fix for this bug, but apparently I didn't 😬

Anyways, the fix is what @ArbitRandomUser suggested. Highlighting will fail, but the parser won't crash.