Closed erik-overdahl closed 10 months ago
Hey thanks for the report! It's weird that this doesn't happen during CI since that example.hcl is also parsed there i think! I'll give it a look tomorrow!
Thanks, your repro script worked fine!
Library installed to ~/.emacs.d/tree-sitter/libtree-sitter-hcl.so
Using grammar at main with pgtk. This should FAIL.
Cloning repository
Compiling library
Library installed to ~/.emacs.d/tree-sitter/libtree-sitter-hcl.so
munmap_chunk(): invalid pointer
Fatal error 6: Aborted
Backtrace:
emacs[0x513632]
emacs[0x4f211f]
emacs[0x511c6a]
emacs[0x511cc2]
/lib64/libc.so.6(+0x3dbb0)[0x7f4d5a1babb0]
/lib64/libc.so.6(+0x8e884)[0x7f4d5a20b884]
/lib64/libc.so.6(gsignal+0x1e)[0x7f4d5a1baafe]
/lib64/libc.so.6(abort+0xdf)[0x7f4d5a1a387f]
/lib64/libc.so.6(+0x2760f)[0x7f4d5a1a460f]
/lib64/libc.so.6(+0x987b5)[0x7f4d5a2157b5]
/lib64/libc.so.6(+0x98a4c)[0x7f4d5a215a4c]
/lib64/libc.so.6(__libc_free+0xca)[0x7f4d5a21a25a]
/root/.emacs.d/tree-sitter/libtree-sitter-hcl.so(+0x41076)[0x7f4d590db076]
/root/.emacs.d/tree-sitter/libtree-sitter-hcl.so(tree_sitter_hcl_external_scanner_deserialize+0x31)[0x7f4d590dc422]
/lib64/libtree-sitter.so.0(ts_parser_parse+0x630)[0x7f4d5a688250]
emacs[0x5ea49d]
emacs[0x578fd6]
emacs[0x5b9151]
emacs[0x57abab]
emacs[0x57b850]
emacs[0x57a478]
emacs[0x57a9f9]
emacs[0x57a198]
emacs[0x57c1f0]
emacs[0x57a198]
emacs[0x57a9f9]
emacs[0x5682c0]
emacs[0x57a198]
emacs[0x57a9f9]
emacs[0x57bc88]
emacs[0x57a198]
emacs[0x59e2a9]
emacs[0x5a5743]
emacs[0x5a6651]
emacs[0x579036]
emacs[0x5b9151]
emacs[0x57abab]
emacs[0x57b13b]
emacs[0x576bc5]
emacs[0x5a5ff6]
emacs[0x579036]
When i provide it the library that gets generated by tree-sitter when you do tree-sitter parse
it just works
docker run -it -v /var/home/mhoffm/git/emacs-29-pgtk-ts-crash-bugreport/tmp:/root/.emacs.d emacs-29-bugreport-pgtk:latest emacs --batch -l minimal-reproduce.el
fedora ~ git emacs-29-pgtk-ts-crash-bugreport main
An Emacs maintainer has also found that the grammar works fine if it is built manually: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=66549#11
I imagine that means the problem is in the way Emacs builds the grammar? Here is the relevant code from Emacs' treesit-install-language-grammar
function:
;; We need to go into the source directory because some
;; header files use relative path (#include "../xxx").
;; cd "${sourcedir}"
(setq default-directory source-dir)
(message "Compiling library")
;; cc -fPIC -c -I. parser.c
(treesit--call-process-signal
cc nil t nil "-fPIC" "-c" "-I." "parser.c")
;; cc -fPIC -c -I. scanner.c
(when (file-exists-p "scanner.c")
(treesit--call-process-signal
cc nil t nil "-fPIC" "-c" "-I." "scanner.c"))
;; c++ -fPIC -I. -c scanner.cc
(when (file-exists-p "scanner.cc")
(treesit--call-process-signal
c++ nil t nil "-fPIC" "-c" "-I." "scanner.cc"))
;; cc/c++ -fPIC -shared *.o -o "libtree-sitter-${lang}.${soext}"
(apply #'treesit--call-process-signal
(if (file-exists-p "scanner.cc") c++ cc)
nil t nil
`("-fPIC" "-shared"
,@(directory-files
default-directory nil
(rx bos (+ anychar) ".o" eos))
"-o" ,lib-name))
;; Copy out.
So it looks like the shared object is built with
git clone --depth=1 https://github.com/MichaHoffmann/tree-sitter-hcl.git
cd tree-sitter-hcl/src
cc -fPIC -I. -c parser.c
cc -fPIC -I. -c scanner.c
cc -fPIC -shared *.o -o libtree-sitter-hcl.so
If I run this and then mount the resulting object into the container at /root/.emacs.d/tree-sitter/libtree-sitter-hcl.so
, I still get the crash.
I don't see any shared object produced when running tree-sitter parse
. I am misunderstanding? How did you produce it?
It's in $HOME/.cache/tree-sitter for me but I also compiled with the emacs commands and it worked on my machine. I'll try tomorrow again, maybe I made a mistake. I also patched emacs to compile the parser with -g and during serialization and deserialization the state of the scanner looks pretty wrong. If I compile outside of the container in my checkout of this repo it works and the state looks fine though.
Cc @amaanq do you know if any other parser had similar issues?
edit: did you also change the minimal-reproduce.el so that it not compiles and overwrites your mounted file?
edit: did you also change the minimal-reproduce.el so that it not compiles and overwrites your mounted file?
I did, yes; that's the most recent commit in my repo.
I can confirm that the .so object created by tree-sitter parse
at ~/.cache/tree-sitter/lib/hcl.so
works just fine :) This is an acceptable workaround for me, but I will keep the bug open with the Emacs maintainers since most people will install these with the Emacs function treesit-install-language-grammar
, and the result of that function still produces this crash.
The shared object produced by tree-sitter parse
is around 20% larger than the one produced by Emacs, but looking into the binaries with something like diff <(readelf -h -S ~/.cache/tree-sitter/lib/hcl.so) <(readelf -h -S ~/.emacs.d/tree-sitter/libtree-sitter-hcl.so)
seems to show that most of the difference is debug info, e.g.:
77,98c77,82
< [26] .debug_aranges PROGBITS 0000000000000000 0003f1b8
< 0000000000000070 0000000000000000 0 0 1
< [27] .debug_info PROGBITS 0000000000000000 0003f228
< 000000000000355b 0000000000000000 0 0 1
< [28] .debug_abbrev PROGBITS 0000000000000000 00042783
< 00000000000006ea 0000000000000000 0 0 1
< [29] .debug_line PROGBITS 0000000000000000 00042e6d
< 0000000000017bfc 0000000000000000 0 0 1
< [30] .debug_str PROGBITS 0000000000000000 0005aa69
< 00000000000013e9 0000000000000001 MS 0 0 1
< [31] .debug_line_str PROGBITS 0000000000000000 0005be52
< 000000000000024c 0000000000000001 MS 0 0 1
< [32] .debug_loclists PROGBITS 0000000000000000 0005c09e
< 000000000000bc65 0000000000000000 0 0 1
< [33] .debug_rnglists PROGBITS 0000000000000000 00067d03
< 00000000000002f2 0000000000000000 0 0 1
< [34] .symtab SYMTAB 0000000000000000 00067ff8
< 00000000000006f0 0000000000000018 35 52 8
< [35] .strtab STRTAB 0000000000000000 000686e8
< 00000000000005f9 0000000000000000 0 0 1
< [36] .shstrtab STRTAB 0000000000000000 00068ce1
< 000000000000018d 0000000000000000 0 0 1
---
> [26] .symtab SYMTAB 0000000000000000 000551b8
> 0000000000000828 0000000000000018 27 65 8
> [27] .strtab STRTAB 0000000000000000 000559e0
> 000000000000069b 0000000000000000 0 0 1
> [28] .shstrtab STRTAB 0000000000000000 0005607b
> 000000000000011d 0000000000000000 0 0 1
Anyway, thanks for looking at this bug. I'm now more convinced that the problem is on Emacs's side. I'll leave it up to you whether or not to keep this issue open.
It's in $HOME/.cache/tree-sitter for me but I also compiled with the emacs commands and it worked on my machine. I'll try tomorrow again, maybe I made a mistake. I also patched emacs to compile the parser with -g and during serialization and deserialization the state of the scanner looks pretty wrong. If I compile outside of the container in my checkout of this repo it works and the state looks fine though.
Cc @amaanq do you know if any other parser had similar issues?
edit: did you also change the minimal-reproduce.el so that it not compiles and overwrites your mounted file?
No, ~it'd be nice to know how the parser is compiled in emacs~ but if it's not reproducible w/ the tree-sitter compiled binary then I'd say it's an emacs issue, still would be nice to figure out why though
Edit - didn't read enough
It's in $HOME/.cache/tree-sitter for me but I also compiled with the emacs commands and it worked on my machine. I'll try tomorrow again, maybe I made a mistake. I also patched emacs to compile the parser with -g and during serialization and deserialization the state of the scanner looks pretty wrong. If I compile outside of the container in my checkout of this repo it works and the state looks fine though. Cc @amaanq do you know if any other parser had similar issues? edit: did you also change the minimal-reproduce.el so that it not compiles and overwrites your mounted file?
No, ~it'd be nice to know how the parser is compiled in emacs~ but if it's not reproducible w/ the tree-sitter compiled binary then I'd say it's an emacs issue, still would be nice to figure out why though
Edit - didn't read enough
Thank you @amaanq !
@erik-overdahl I'll keep this open. It feels like the library should not fail with invalid memory access, regardless of compilation flags or compiler there. Also the way emacs compiles it does not even look offensive at all so i would really like to know whats happening before i'll close the issue. I'm happy that we found a workaround for you though!
Unfortunately, I have discovered that using the object file produced by tree-sitter parse
does not actually work in Emacs. It does not crash Emacs outright, but the parse tree produced contains only a small handful of ERROR tokens, while the same file fully parses the same file when used via tree-sitter parse
. The new scanner is unusable in Emacs.
The Emacs maintainers have decided that this isn't a problem on their side https://debbugs.gnu.org/cgi/bugreport.cgi?bug=66549#35
Unless you want to keep digging further, the easiest "fix" is probably to put a note in the readme that HEAD doesn't work with Emacs 29.1 builtin treesit mode.
If you look at the last few messages in the Emacs bugreport thread, you'll see that the crash only happens when compiled at -O0 (which Emacs's treesit-install-language-grammar
function does), but that I never was able to produce the correct parse tree in Emacs.
Hey sorry for not responding but I had no time last days. Let's dog a bit more over the weekend, it's definitely something I want to understand
@amaanq do you have an idea? I have not been able to reproduce this outside of emacs, even fuzzing for a while. Because i have no idea how to fix forward I start leaning towards rolling back the move to C, would that be acceptable from your side?
I have an idea but let me get to a computer first in a couple hours and I'll send a patch as to what I think a fix could be, but if you'd like to implement my idea now replace the serialized byte lengths that are casted to char/uint8_t with a memcpy of all 4 bytes instead, or just write each byte properly with shifts
Sorry @amaanq ; are you able to send a PR please?
Sorry, yes let me do that
It'd be nice @erik-overdahl if you can try those changes out and see if the crashing is fixed
Hey @erik-overdahl ; latest main solves it for me with your repro instructions! Can you check it out please?
The commit fixes the crash that occurred when the grammar was compiled with optimization level 0 (as Emacs does by default). The parse tree of the example file is now closer to correct but still contains a large number of error tokens. So the originally reported issue here is fixed, but the grammar is not completely usable in Emacs.
On Mon, Nov 13, 2023 at 09:22 Michael Hoffmann @.***> wrote:
Hey @erik-overdahl https://github.com/erik-overdahl ; latest main solves it for me with your repro instructions! Can you check it out please?
— Reply to this email directly, view it on GitHub https://github.com/MichaHoffmann/tree-sitter-hcl/issues/38#issuecomment-1808370024, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMTYYJY6UE4O5Q35RPZWMKTYEI3LVAVCNFSM6AAAAAA6APM6BWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBYGM3TAMBSGQ . You are receiving this because you were mentioned.Message ID: @.***>
Can you help me reproduce that? it should not contain errors. Is there a way to dump the parse tree from emacs? [ sorry no emacs user here ]
But not crashing is a good first step !
i managed with
(with-temp-buffer
(insert-file-contents "example.hcl")
;; `treesit-parser-create' creates a parser for the buffer that is
;; then invoked lazily. Using `treesit-parse-string' to force parse.
(append-to-file (treesit-node-string (treesit-parse-string (buffer-string) 'hcl)) nil "/dev/stdout"))
yeah strings look pretty messed up; let me try to fix it!
@erik-overdahl i have a version that parses successfully! ill push in a minute
@erik-overdahl can you try locally from mhoffm-fix-emacs-parse-errors
?
Since I was able to reproduce initially but not on latest main, ill close this. @erik-overdahl if you still have the issue please reopen! If you can confirm that this is also solved for you it would be cool to know too!
When building the grammar from HEAD (currently b553906) and attempting to use the grammar to highlight an HCL file in Emacs 29.1, the grammar causes Emacs to crash with a
munmap_chunk(): invalid pointer
error.I am unsure of whether this bug resides in the grammar or within Emacs. It only occurs when Emacs is configured with the flag
--with-pgtk
, which tends to be finicky. However, it does not occur with the v1.1.0 release from this repo, and so may be a problem with the rewritten scanner. I have also submitted a report to the GNU Emacs bug tracker.I've written up a full reproduction of the crash here: https://github.com/erik-overdahl/emacs-29-pgtk-ts-crash-bugreport. Please let me know if you need more information.