termux / termux-packages

A package build system for Termux.
https://termux.dev
Other
13.12k stars 3.01k forks source link

Emacs: Fatal error 11: Segmentation fault #2599

Closed Grimler91 closed 4 years ago

Grimler91 commented 6 years ago

This happens on arm, android 7. At least 3 other people experienced the same according to a google+ post. The crashes aren't very frequent and I haven't found an easy way to reproduce them.

I have experienced this since around 4cba233fab39224e15d5e0c73965a5ebf1b828f2 but building the previous version (25c69809a1a811768e2ebe5d4205b71cb8991f82) doesn't change anything.

I've build a debug version and investigated with gdb and valgrind though. The debug deb for emacs and all dependencies are available from https://grimler.se/dists/testing/debug.

gdb shows:

Program received signal SIGSEGV, Segmentation fault.
                                                      0xb69dd240 in sigsetjmp () from /system/lib/libc.so
(gdb) bt
#0  0xb69dd240 in sigsetjmp () from /system/lib/libc.so
#1  0xdd0e51c8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

While valgrind reports lots and lots of Conditional jump or move depends on uninitialised value(s) and Use of uninitialised value of size 4, a snippet from running valgrind --leak-check=full --track-origins=yes -v might look like:

==8250== Conditional jump or move depends on uninitialised value(s)
==8250==    at 0x1AF8E2: mark_maybe_object (alloc.c:4705)
==8250==    by 0x1AFD69: mark_memory (alloc.c:4895)
==8250==    by 0x1AFD69: mark_stack (alloc.c:5038)
==8250==    by 0x1AFD69: garbage_collect_1 (alloc.c:5760)
==8250==    by 0x1AFD69: Fgarbage_collect (alloc.c:5983)
==8250==    by 0x1BF401: eval_sub (eval.c:2169)
==8250==    by 0x1BF69D: Fprogn (eval.c:431)
==8250==    by 0x1BF8AF: funcall_lambda (eval.c:2922)
==8250==    by 0x1BFBBF: Ffuncall (eval.c:2760)
==8250==    by 0x1BFC55: funcall_nil (eval.c:2338)
==8250==    by 0x1BEC59: run_hook_with_args (eval.c:2515)
==8250==    by 0x1BEC9D: Frun_hook_with_args (eval.c:2380)
==8250==    by 0x1BFA79: Ffuncall (eval.c:2679)
==8250==    by 0x1E2CC3: exec_byte_code (bytecode.c:880)
==8250==    by 0x1BF785: funcall_lambda (eval.c:2863)
==8250==  Uninitialised value was created by a stack allocation
==8250==    at 0x1AFA38: Fgarbage_collect (alloc.c:5929)
==8250==
==8250== Use of uninitialised value of size 4
==8250==    at 0x1AF8E4: mark_maybe_object (alloc.c:4705)
==8250==    by 0x1AFD69: mark_memory (alloc.c:4895)
==8250==    by 0x1AFD69: mark_stack (alloc.c:5038)
==8250==    by 0x1AFD69: garbage_collect_1 (alloc.c:5760)
==8250==    by 0x1AFD69: Fgarbage_collect (alloc.c:5983)
==8250==    by 0x1BF401: eval_sub (eval.c:2169)
==8250==    by 0x1BF69D: Fprogn (eval.c:431)
==8250==    by 0x1BF8AF: funcall_lambda (eval.c:2922)
==8250==    by 0x1BFBBF: Ffuncall (eval.c:2760)
==8250==    by 0x1BFC55: funcall_nil (eval.c:2338)
==8250==    by 0x1BEC59: run_hook_with_args (eval.c:2515)
==8250==    by 0x1BEC9D: Frun_hook_with_args (eval.c:2380)
==8250==    by 0x1BFA79: Ffuncall (eval.c:2679)
==8250==    by 0x1E2CC3: exec_byte_code (bytecode.c:880)
==8250==    by 0x1BF785: funcall_lambda (eval.c:2863)
==8250==  Uninitialised value was created by a stack allocation
==8250==    at 0x1AFA38: Fgarbage_collect (alloc.c:5929)

Exact lines and functions varies though, another log had

==7058== Conditional jump or move depends on uninitialised value(s)
==7058==    at 0x1BF5D2: mem_find (alloc.c:4212)
==7058==    by 0x1C18A7: mark_maybe_pointer (alloc.c:4889)
==7058==    by 0x1BF40F: mark_memory (alloc.c:4985)
==7058==    by 0x214E4F: mark_one_thread (thread.c:616)
==7058==    by 0x2146E9: mark_threads_callback (thread.c:649)
==7058==    by 0x1BFE15: garbage_collect_1 (alloc.c:6001)
==7058==    by 0x1D2CAD: eval_sub (eval.c:2231)
==7058==    by 0x1D2E83: Fprogn (eval.c:455)
==7058==    by 0x1D591B: funcall_lambda (eval.c:3045)
==7058==    by 0x1D50B1: Ffuncall (eval.c:2784)
==7058==    by 0x1D52FB: funcall_nil (eval.c:2400)
==7058==    by 0x1D52D9: run_hook_with_args (eval.c:2576)
==7058==  Uninitialised value was created by a stack allocation
==7058==    at 0x1D2798: eval_sub (eval.c:2113)
==7058==
==7058== Use of uninitialised value of size 4
==7058==    at 0x1BF5C4: mem_find (alloc.c:4210)
==7058==    by 0x1C18A7: mark_maybe_pointer (alloc.c:4889)
==7058==    by 0x1BF40F: mark_memory (alloc.c:4985)
==7058==    by 0x214E4F: mark_one_thread (thread.c:616)
==7058==    by 0x2146E9: mark_threads_callback (thread.c:649)
==7058==    by 0x1BFE15: garbage_collect_1 (alloc.c:6001)
==7058==    by 0x1D2CAD: eval_sub (eval.c:2231)
==7058==    by 0x1D2E83: Fprogn (eval.c:455)
==7058==    by 0x1D591B: funcall_lambda (eval.c:3045)
==7058==    by 0x1D50B1: Ffuncall (eval.c:2784)
==7058==    by 0x1D52FB: funcall_nil (eval.c:2400)
==7058==    by 0x1D52D9: run_hook_with_args (eval.c:2576)
==7058==  Uninitialised value was created by a stack allocation
==7058==    at 0x1D2798: eval_sub (eval.c:2113)

Two example full valgrind logs are available at https://grimler.se/emacs_segfault2 and https://grimler.se/emacs_segfault3.

Running emacs under valgrind on my aarch64 give similar problems but I haven't experienced a segfault. I assume therefore that we have a problem in our emacs on all arches.

I'm not sure how to debug this further.

mindbound commented 6 years ago

Same here, arm, Android 6.

jeromezero commented 6 years ago

I have experienced the same issue. Nvidia shield tablet. Android 7 now, consistent segfaults since 6. I don't think they happened at all when I was back on lolipop ...

Factavi commented 6 years ago

I am getting a lot of emacs seg faults especially with touch-scrolling. (Android 7)

Grimler91 commented 6 years ago

@Factavi Hm, I mostly use a hardware keyboard so haven't noticed if it happens often when touch-scrolling.

For me it seem to happen more or less randomly.

abbaxi commented 6 years ago

This happens frequently to me. I'm using the ARM architecture, bluetooth keyboard, most current version of emacs in termux's repository, as of 15:00 MST, OCT 6th.

Emacs is primarily what I use termux for, this is an ongoing, massive source of frustration. Let me know if I can be more helpful, -abbaxi

kanreki commented 5 years ago

This happens for me too: HP Chromebook 13 G1

Normally I live inside Emacs all the time (including shell window). But this is a deal-breaker. I've resorted to learning Vim, and you know how horrifying that is.

Been happening since I first installed Termux, approx. May, 2017. I can't say how often, because this has caused me to mostly abandon Emacs.

kanreki commented 5 years ago

Sorry, the HP Chromebook 13 G1 uses an Intel Core m7, so I guess I should not have piled on to this ARM bug report.

jeromezero commented 5 years ago

I've been running an nvidia shield tablet - mostly as a portable emacs machine. This has been happening since Marshmallow. I'm on Nougat now, latest official NVIDIA stock rom. FWIW, I've been able to run stable emacs through termux through Arch Linux (https://github.com/sdrausty/termux-archlinux) running in a PRoot ... Its not elegant, but it's stable ....

On Thu, Nov 22, 2018 at 4:34 PM kanreki notifications@github.com wrote:

Sorry, the HP Chromebook 13 G1 uses an Intel Core m7, so I guess I should not have piled on to this ARM bug report.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/termux/termux-packages/issues/2599#issuecomment-441137645, or mute the thread https://github.com/notifications/unsubscribe-auth/AE1OehtBt7rhOUB39-8olzZnQXVIUgxqks5ux0J_gaJpZM4U-cl8 .

sjuswede commented 5 years ago

Happens to me frequently as well, on a Samsung Tab A (2016). Really annoying, as I pretty much got the tablet for Termux and Emacs.

If there is any way I can help, please let me know!

oscarfv commented 5 years ago

I think that Emacs on termux was built with CANNOT_DUMP, which is mostly untested. The development version of Emacs (branch master on the git repo) has a new portable dumper. It would be interesting to try it and see if the segfault goes away.

ferfebles commented 5 years ago

Same problem here with a Samsung Tab A (2016). If I start emacs with "emacs -q" it seems to be stable. But if I load the init.el file, it gives a segmentation fault after a few seconds of use.

Trying to reduce de init.el file... But I'm loosing so much of Emacs that I would prefer to use nano.

oscarfv commented 5 years ago

For me it segfaults with -Q too. With or without init file, in my experience the time it takes to segfault varies: from a few seconds to 20 minutes or more.

ferfebles commented 5 years ago

You're right @oscarfv, it segfaults with 'emacs -q' too. It took a bit longer but it gave me the error after scrolling down and up about 20 times the '.emacs' file.

I tested emacs with the new Android 7 branch (in alpha testing) but it gave me the same error. I don't have the knowledge to try to build emacs without CANNOT_DUMP. If someone finds any solution please post it here.

oscarfv commented 5 years ago

Tried emacs 27 with the new portable dumper. Same crash.

oscarfv commented 5 years ago

A fast method for triggering the crash here is to visit some file with a few hundred lines and keep pressed the cursor-down key. Usually it will crash on reaching the bottom edge of the window after scrolling a few screenfulls of text.

Grimler91 commented 4 years ago

So, I've tried the following since last year (without solving the problem):

Since gdb cannot tell where the crash comes from, even if emacs and all dependencies are built with debug symbols, I am guessing the problem arises in a /system/lib library. This does not mean that the bug is not solvable, just that it is harder to pinpoint the actual problem and solve it.

I also do not think I have encountered the segfault when using emacs --daemon + emacsclient, but I am not using the daemon a lot

snogglethorpe commented 4 years ago

What environment do you use for Termux development?

I'd kind of like to do more debugging, but as I mentioned earlier, I'm a little afraid my Chromebook may melt.... I'm curious if other people are using standard devices for compiling large packages.

[I have a Samsung Chromebook Pro, with 4GB RAM and maybe 5GB of free flash storage.]

oscarfv commented 4 years ago

@snogglethorpe : If Emacs qualifies as a large package for you, on April I compiled Emacs 27 on a cheap, generic quad-core tablet without problems.

Grimler91 commented 4 years ago

@snogglethorpe your device will (most likely) shut down if the cpu reaches a too high temperature, it's not really anything you need to be afraid of

snogglethorpe commented 4 years ago

Ok, I've built Emacs master on my CB, and it seems to work properly (with a little futzing around to update the Termux patches), dumping and all, and I can run "emacs -fg-daemon" in gdb...

Unfortunately, it's so far stubbornly refused to crash...! ^^;

[Which doesn't mean much, sometimes the distro emacs runs for many many hours before crashing...]

snogglethorpe commented 4 years ago

Ok, now it's crashing as expected, in the same maddeningly hard to debug way the emacs package does.

BTW, one thing is much better about using this build than the current Termux emacs package: it's dumped, so restarting emacs after a crash is much faster, essentially instantaneous, whereas the non-dumped emacs in the Termux package takes 3-4 seconds to start.

Would it be possible to use dumping for the Termux emacs package?

ghost commented 4 years ago

Would it be possible to use dumping for the Termux emacs package?

No. See https://github.com/termux/termux-packages/blob/master/packages/emacs/build.sh#L62.

oscarfv commented 4 years ago

@snogglethorpe : dumping Emacs 27 works thanks to the new portable dumper. Termux packages the latest Emacs stable release, i. e. 26.

Grimler91 commented 4 years ago

@oscarfv omg, finally a dumped emacs in termux, thanks for the tip!

And I haven't been able to make it crash in my initial testing (scrolling ~10k lines), I'll leave this issue open until emacs-27 is packaged, and will close it if I haven't been able to reproduce a crash by then!

snogglethorpe commented 4 years ago

I've been using dumping in my local build (built in Termux) of the emacs trunk (so version 27.x), and it works fine—and starts up much faster than an un-dumped emacs.

However it does still exhibit the random hard-to-debug crashiness of the current Termux emacs package (which doesn't use dumping). As with that version of emacs, it's pretty random—sometimes it goes for ages without crashing, sometimes it crashed almost immediately after startup.

oscarfv commented 4 years ago

@Grimler91 : even if it works for you the bug should remain open because, first, Emacs 27 is not the current release and, second, the crash also happens with Emacs 27.

When I tried Emacs 27 my experience matched what @snogglethorpe describes.

kanreki commented 4 years ago

I tried it on my old tablet, but could not reproduce the result you describe.

On Sun, Feb 9, 2020, 7:01 AM zettelmuseum notifications@github.com wrote:

I can reproduce "Fatal error 11:segmentation fault" reliably on my (old) tablet. This is vanilla emacs installed in termux via apt, empty .emacs, no packages. Steps:

  1. open latest org-manual.org ( https://raw.githubusercontent.com/bzg/org-mode/master/doc/org-manual.org )

2)do the following immediately after file has loaded and as quick as possible shift-tab shift-tab (status should display "CONTENTS...done" now press DOWN (I'm using Hacker's Keyboard) and KEEP IT PRESSED

...will crash 9 out of 10 times. Can anyone reproduce this also?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/termux/termux-packages/issues/2599?email_source=notifications&email_token=AHXQWYLLUC5QCXUC3B4M5Y3RCALEVA5CNFSM4FHZZF6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELGOV6I#issuecomment-583854841, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHXQWYIYCTR6OMQ746DQC3LRCALEVANCNFSM4FHZZF6A .

zettelmuseum commented 4 years ago

Here is a way to reliably reproduce this crash. Vanilla emacs from termux apt, no configs, no packages, no .emacs.

Step 1 (create our test file, may take a few seconds)

for i in {1..200}; do echo "* $i"; echo "** $i"; for j in {1..1000}; do echo "bla"; done;  done > test123.org
emacs --file test123.org

Now; it is important to do Step 2 very quickly, especially holding the down key after the return key. If it doesn't crash the first time, just repeat Step 2 a few more times. Always crashes for me after a few tries. EDIT: it may take up to 10 tries. also, zoom your termux display to 20 rows.

Step 2

C-x C-v RET
*immediately* after RET press the down key 
and HOLD IT until crash or end of file. 
(I'm using Hacker's keyboard)

Can anyone also reproduce it this way?

zettelmuseum commented 4 years ago

@kanreki can you try again with new method?

hindux commented 4 years ago

@zettelmuseum , no it's not crashes

zettelmuseum commented 4 years ago

@krishna-arch you may need to repeat step 2 up to 10 times and do it quick never took more than 10 tries here

Grimler91 commented 4 years ago

@zettelmuseum , no it's not crashes

Crash seem to only happen on arm, and mostly (only?) on samsung devices. Maybe @krishna-arch has another type of device.

@zettelmuseum I can reproduce the crash with your testfile. Can't tell if it crashes faster compared to just scrolling in a "normal" file, but it crashes nonetheless.

zettelmuseum commented 4 years ago

also it seems to crash faster if you zoom termux display (max. 20 rows)

zettelmuseum commented 4 years ago

@Grimler91 by the way, not a Samsung device here, just a generic arm phablet

Grimler91 commented 4 years ago

by the way, not a Samsung device here, just a generic arm phablet

Good to know, thanks for the info!

On another note: I've noticed that compiling emacs with --enable-checking='yes,glyphs' (as suggested in the DEBUG notes) gives a make error:

[...]
Loading /data/data/com.termux/files/home/projects/emacs/lisp/emacs-lisp/syntax.el (source)...
Loading /data/data/com.termux/files/home/projects/emacs/lisp/font-lock.el (source)...
Loading /data/data/com.termux/files/home/projects/emacs/lisp/jit-lock.el (source)...

../../src/fns.c:2856: Emacs fatal error: assertion failed: !FIXNUM_OVERFLOW_P (lisp_h_make_fixnum_n)
Fatal error 6: n
make[1]: *** [Makefile:817: bootstrap-emacs.pdmp] Aborted
make[1]: Leaving directory '/data/data/com.termux/files/home/projects/emacs/build/src'
make: *** [Makefile:424: src] Error 2

Might be related to this bug, or maybe it's just a problem on the emacs-27 branch. I'll ask for advice on the emacs mailing list.

zettelmuseum commented 4 years ago

Here's another interesting observation. Using the method described above, I ran Step 2 2x30 times, with and without TMUX. Without tmux: 8 crashes, 8/30 runs crashed, 27%. With tmux: 0 crashes, 0/30 runs crashed, 0%.

Too early to tell what this means.., but I'm going to run emacs inside tmux from now on :-)

EDIT: this is using the exact same termux package, not proot or anything.

apt install tmux
tmux
emacs
Grimler91 commented 4 years ago

I have merged a potential fix for this: https://github.com/termux/termux-packages/commit/996c56939720709525a5a909cb0aec85ec0de8fd, it should be available in a few minutes. If anyone still gets segfaults after upgrading emacs to 26.3-5, please let me know.

snogglethorpe commented 4 years ago

That change only addresses arm, but the same crash happens on x86 ....

Grimler91 commented 4 years ago

@snogglethorpe thanks, fixed in https://github.com/termux/termux-packages/commit/c6fe679782306143be93fbcf34682b0a83189b1d. 26.3-6 should be available in a couple of minutes

kanreki commented 4 years ago

So far, so good! Thanks!! I will certainly continue using this, and will report back if anything is amiss.