tree-sitter-grammars / tree-sitter-hcl

HCL grammar for tree-sitter
https://tree-sitter-grammars.github.io/tree-sitter-hcl/
Apache License 2.0
92 stars 20 forks source link

Emacs crashes due to SIGABRT when using new scanner #38

Closed erik-overdahl closed 10 months ago

erik-overdahl commented 11 months ago

When building the grammar from HEAD (currently b553906) and attempting to use the grammar to highlight an HCL file in Emacs 29.1, the grammar causes Emacs to crash with a munmap_chunk(): invalid pointer error.

I am unsure of whether this bug resides in the grammar or within Emacs. It only occurs when Emacs is configured with the flag --with-pgtk, which tends to be finicky. However, it does not occur with the v1.1.0 release from this repo, and so may be a problem with the rewritten scanner. I have also submitted a report to the GNU Emacs bug tracker.

I've written up a full reproduction of the crash here: https://github.com/erik-overdahl/emacs-29-pgtk-ts-crash-bugreport. Please let me know if you need more information.

MichaHoffmann commented 11 months ago

Hey thanks for the report! It's weird that this doesn't happen during CI since that example.hcl is also parsed there i think! I'll give it a look tomorrow!

MichaHoffmann commented 11 months ago

Thanks, your repro script worked fine!

Library installed to ~/.emacs.d/tree-sitter/libtree-sitter-hcl.so
Using grammar at main with pgtk. This should FAIL.
Cloning repository
Compiling library
Library installed to ~/.emacs.d/tree-sitter/libtree-sitter-hcl.so
munmap_chunk(): invalid pointer
Fatal error 6: Aborted
Backtrace:
emacs[0x513632]
emacs[0x4f211f]
emacs[0x511c6a]
emacs[0x511cc2]
/lib64/libc.so.6(+0x3dbb0)[0x7f4d5a1babb0]
/lib64/libc.so.6(+0x8e884)[0x7f4d5a20b884]
/lib64/libc.so.6(gsignal+0x1e)[0x7f4d5a1baafe]
/lib64/libc.so.6(abort+0xdf)[0x7f4d5a1a387f]
/lib64/libc.so.6(+0x2760f)[0x7f4d5a1a460f]
/lib64/libc.so.6(+0x987b5)[0x7f4d5a2157b5]
/lib64/libc.so.6(+0x98a4c)[0x7f4d5a215a4c]
/lib64/libc.so.6(__libc_free+0xca)[0x7f4d5a21a25a]
/root/.emacs.d/tree-sitter/libtree-sitter-hcl.so(+0x41076)[0x7f4d590db076]
/root/.emacs.d/tree-sitter/libtree-sitter-hcl.so(tree_sitter_hcl_external_scanner_deserialize+0x31)[0x7f4d590dc422]
/lib64/libtree-sitter.so.0(ts_parser_parse+0x630)[0x7f4d5a688250]
emacs[0x5ea49d]
emacs[0x578fd6]
emacs[0x5b9151]
emacs[0x57abab]
emacs[0x57b850]
emacs[0x57a478]
emacs[0x57a9f9]
emacs[0x57a198]
emacs[0x57c1f0]
emacs[0x57a198]
emacs[0x57a9f9]
emacs[0x5682c0]
emacs[0x57a198]
emacs[0x57a9f9]
emacs[0x57bc88]
emacs[0x57a198]
emacs[0x59e2a9]
emacs[0x5a5743]
emacs[0x5a6651]
emacs[0x579036]
emacs[0x5b9151]
emacs[0x57abab]
emacs[0x57b13b]
emacs[0x576bc5]
emacs[0x5a5ff6]
emacs[0x579036]
MichaHoffmann commented 11 months ago

When i provide it the library that gets generated by tree-sitter when you do tree-sitter parse it just works

docker run -it -v /var/home/mhoffm/git/emacs-29-pgtk-ts-crash-bugreport/tmp:/root/.emacs.d emacs-29-bugreport-pgtk:latest emacs --batch -l minimal-reproduce.el
 fedora  ~  git  emacs-29-pgtk-ts-crash-bugreport   main 
erik-overdahl commented 11 months ago

An Emacs maintainer has also found that the grammar works fine if it is built manually: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=66549#11

I imagine that means the problem is in the way Emacs builds the grammar? Here is the relevant code from Emacs' treesit-install-language-grammar function:

          ;; We need to go into the source directory because some
          ;; header files use relative path (#include "../xxx").
          ;; cd "${sourcedir}"
          (setq default-directory source-dir)
          (message "Compiling library")
          ;; cc -fPIC -c -I. parser.c
          (treesit--call-process-signal
           cc nil t nil "-fPIC" "-c" "-I." "parser.c")
          ;; cc -fPIC -c -I. scanner.c
          (when (file-exists-p "scanner.c")
            (treesit--call-process-signal
             cc nil t nil "-fPIC" "-c" "-I." "scanner.c"))
          ;; c++ -fPIC -I. -c scanner.cc
          (when (file-exists-p "scanner.cc")
            (treesit--call-process-signal
             c++ nil t nil "-fPIC" "-c" "-I." "scanner.cc"))
          ;; cc/c++ -fPIC -shared *.o -o "libtree-sitter-${lang}.${soext}"
          (apply #'treesit--call-process-signal
                 (if (file-exists-p "scanner.cc") c++ cc)
                 nil t nil
                 `("-fPIC" "-shared"
                   ,@(directory-files
                      default-directory nil
                      (rx bos (+ anychar) ".o" eos))
                   "-o" ,lib-name))
          ;; Copy out.

So it looks like the shared object is built with

git clone --depth=1 https://github.com/MichaHoffmann/tree-sitter-hcl.git
cd tree-sitter-hcl/src
cc -fPIC -I. -c parser.c
cc -fPIC -I. -c scanner.c
cc -fPIC -shared *.o -o libtree-sitter-hcl.so

If I run this and then mount the resulting object into the container at /root/.emacs.d/tree-sitter/libtree-sitter-hcl.so, I still get the crash.

I don't see any shared object produced when running tree-sitter parse. I am misunderstanding? How did you produce it?

MichaHoffmann commented 11 months ago

It's in $HOME/.cache/tree-sitter for me but I also compiled with the emacs commands and it worked on my machine. I'll try tomorrow again, maybe I made a mistake. I also patched emacs to compile the parser with -g and during serialization and deserialization the state of the scanner looks pretty wrong. If I compile outside of the container in my checkout of this repo it works and the state looks fine though.

Cc @amaanq do you know if any other parser had similar issues?

edit: did you also change the minimal-reproduce.el so that it not compiles and overwrites your mounted file?

erik-overdahl commented 11 months ago

edit: did you also change the minimal-reproduce.el so that it not compiles and overwrites your mounted file?

I did, yes; that's the most recent commit in my repo.

I can confirm that the .so object created by tree-sitter parse at ~/.cache/tree-sitter/lib/hcl.so works just fine :) This is an acceptable workaround for me, but I will keep the bug open with the Emacs maintainers since most people will install these with the Emacs function treesit-install-language-grammar, and the result of that function still produces this crash.

The shared object produced by tree-sitter parse is around 20% larger than the one produced by Emacs, but looking into the binaries with something like diff <(readelf -h -S ~/.cache/tree-sitter/lib/hcl.so) <(readelf -h -S ~/.emacs.d/tree-sitter/libtree-sitter-hcl.so) seems to show that most of the difference is debug info, e.g.:

77,98c77,82
<   [26] .debug_aranges    PROGBITS         0000000000000000  0003f1b8
<        0000000000000070  0000000000000000           0     0     1
<   [27] .debug_info       PROGBITS         0000000000000000  0003f228
<        000000000000355b  0000000000000000           0     0     1
<   [28] .debug_abbrev     PROGBITS         0000000000000000  00042783
<        00000000000006ea  0000000000000000           0     0     1
<   [29] .debug_line       PROGBITS         0000000000000000  00042e6d
<        0000000000017bfc  0000000000000000           0     0     1
<   [30] .debug_str        PROGBITS         0000000000000000  0005aa69
<        00000000000013e9  0000000000000001  MS       0     0     1
<   [31] .debug_line_str   PROGBITS         0000000000000000  0005be52
<        000000000000024c  0000000000000001  MS       0     0     1
<   [32] .debug_loclists   PROGBITS         0000000000000000  0005c09e
<        000000000000bc65  0000000000000000           0     0     1
<   [33] .debug_rnglists   PROGBITS         0000000000000000  00067d03
<        00000000000002f2  0000000000000000           0     0     1
<   [34] .symtab           SYMTAB           0000000000000000  00067ff8
<        00000000000006f0  0000000000000018          35    52     8
<   [35] .strtab           STRTAB           0000000000000000  000686e8
<        00000000000005f9  0000000000000000           0     0     1
<   [36] .shstrtab         STRTAB           0000000000000000  00068ce1
<        000000000000018d  0000000000000000           0     0     1
---
>   [26] .symtab           SYMTAB           0000000000000000  000551b8
>        0000000000000828  0000000000000018          27    65     8
>   [27] .strtab           STRTAB           0000000000000000  000559e0
>        000000000000069b  0000000000000000           0     0     1
>   [28] .shstrtab         STRTAB           0000000000000000  0005607b
>        000000000000011d  0000000000000000           0     0     1

Anyway, thanks for looking at this bug. I'm now more convinced that the problem is on Emacs's side. I'll leave it up to you whether or not to keep this issue open.

amaanq commented 11 months ago

It's in $HOME/.cache/tree-sitter for me but I also compiled with the emacs commands and it worked on my machine. I'll try tomorrow again, maybe I made a mistake. I also patched emacs to compile the parser with -g and during serialization and deserialization the state of the scanner looks pretty wrong. If I compile outside of the container in my checkout of this repo it works and the state looks fine though.

Cc @amaanq do you know if any other parser had similar issues?

edit: did you also change the minimal-reproduce.el so that it not compiles and overwrites your mounted file?

No, ~it'd be nice to know how the parser is compiled in emacs~ but if it's not reproducible w/ the tree-sitter compiled binary then I'd say it's an emacs issue, still would be nice to figure out why though

Edit - didn't read enough

mhoffm-aiven commented 11 months ago

It's in $HOME/.cache/tree-sitter for me but I also compiled with the emacs commands and it worked on my machine. I'll try tomorrow again, maybe I made a mistake. I also patched emacs to compile the parser with -g and during serialization and deserialization the state of the scanner looks pretty wrong. If I compile outside of the container in my checkout of this repo it works and the state looks fine though. Cc @amaanq do you know if any other parser had similar issues? edit: did you also change the minimal-reproduce.el so that it not compiles and overwrites your mounted file?

No, ~it'd be nice to know how the parser is compiled in emacs~ but if it's not reproducible w/ the tree-sitter compiled binary then I'd say it's an emacs issue, still would be nice to figure out why though

Edit - didn't read enough

Thank you @amaanq !

mhoffm-aiven commented 11 months ago

@erik-overdahl I'll keep this open. It feels like the library should not fail with invalid memory access, regardless of compilation flags or compiler there. Also the way emacs compiles it does not even look offensive at all so i would really like to know whats happening before i'll close the issue. I'm happy that we found a workaround for you though!

erik-overdahl commented 11 months ago

Unfortunately, I have discovered that using the object file produced by tree-sitter parse does not actually work in Emacs. It does not crash Emacs outright, but the parse tree produced contains only a small handful of ERROR tokens, while the same file fully parses the same file when used via tree-sitter parse. The new scanner is unusable in Emacs.

erik-overdahl commented 11 months ago

The Emacs maintainers have decided that this isn't a problem on their side https://debbugs.gnu.org/cgi/bugreport.cgi?bug=66549#35

Unless you want to keep digging further, the easiest "fix" is probably to put a note in the readme that HEAD doesn't work with Emacs 29.1 builtin treesit mode.

If you look at the last few messages in the Emacs bugreport thread, you'll see that the crash only happens when compiled at -O0 (which Emacs's treesit-install-language-grammar function does), but that I never was able to produce the correct parse tree in Emacs.

MichaHoffmann commented 11 months ago

Hey sorry for not responding but I had no time last days. Let's dog a bit more over the weekend, it's definitely something I want to understand

MichaHoffmann commented 11 months ago

@amaanq do you have an idea? I have not been able to reproduce this outside of emacs, even fuzzing for a while. Because i have no idea how to fix forward I start leaning towards rolling back the move to C, would that be acceptable from your side?

amaanq commented 11 months ago

I have an idea but let me get to a computer first in a couple hours and I'll send a patch as to what I think a fix could be, but if you'd like to implement my idea now replace the serialized byte lengths that are casted to char/uint8_t with a memcpy of all 4 bytes instead, or just write each byte properly with shifts

MichaHoffmann commented 10 months ago

Sorry @amaanq ; are you able to send a PR please?

amaanq commented 10 months ago

Sorry, yes let me do that

39

It'd be nice @erik-overdahl if you can try those changes out and see if the crashing is fixed

MichaHoffmann commented 10 months ago

Hey @erik-overdahl ; latest main solves it for me with your repro instructions! Can you check it out please?

erik-overdahl commented 10 months ago

The commit fixes the crash that occurred when the grammar was compiled with optimization level 0 (as Emacs does by default). The parse tree of the example file is now closer to correct but still contains a large number of error tokens. So the originally reported issue here is fixed, but the grammar is not completely usable in Emacs.

On Mon, Nov 13, 2023 at 09:22 Michael Hoffmann @.***> wrote:

Hey @erik-overdahl https://github.com/erik-overdahl ; latest main solves it for me with your repro instructions! Can you check it out please?

— Reply to this email directly, view it on GitHub https://github.com/MichaHoffmann/tree-sitter-hcl/issues/38#issuecomment-1808370024, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMTYYJY6UE4O5Q35RPZWMKTYEI3LVAVCNFSM6AAAAAA6APM6BWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBYGM3TAMBSGQ . You are receiving this because you were mentioned.Message ID: @.***>

MichaHoffmann commented 10 months ago

Can you help me reproduce that? it should not contain errors. Is there a way to dump the parse tree from emacs? [ sorry no emacs user here ]

MichaHoffmann commented 10 months ago

But not crashing is a good first step !

MichaHoffmann commented 10 months ago

i managed with

(with-temp-buffer
  (insert-file-contents "example.hcl")
  ;; `treesit-parser-create' creates a parser for the buffer that is
  ;; then invoked lazily. Using `treesit-parse-string' to force parse.
  (append-to-file (treesit-node-string (treesit-parse-string (buffer-string) 'hcl)) nil "/dev/stdout"))
MichaHoffmann commented 10 months ago

yeah strings look pretty messed up; let me try to fix it!

MichaHoffmann commented 10 months ago

@erik-overdahl i have a version that parses successfully! ill push in a minute

MichaHoffmann commented 10 months ago

@erik-overdahl can you try locally from mhoffm-fix-emacs-parse-errors ?

MichaHoffmann commented 10 months ago

Since I was able to reproduce initially but not on latest main, ill close this. @erik-overdahl if you still have the issue please reopen! If you can confirm that this is also solved for you it would be cool to know too!