signalapp / Signal-Desktop

A private messenger for Windows, macOS, and Linux.
https://signal.org/download
GNU Affero General Public License v3.0
14.67k stars 2.68k forks source link

crash without stdout of starting terminal available #5057

Closed knarrff closed 1 day ago

knarrff commented 3 years ago

Bug Description

I usually start signal from the command line, like this:

signal-desktop --start-in-tray --no-sandbox &

The '&' puts it into background, and after that I close that terminal window (logout, not window-kill). This worked in the past, signal kept running.

What now happens is that signal-desktop works fine as long as the terminal it was started from is still there (but signal is already in the background), but once I logout from that terminal, about 1-2 seconds later, signal also stops. As far as I can see, there is nothing related to an error in the logs, but is also not the 'regular quit' log. What I see is that the log simply stops, e.g.:

{"name":"log","hostname":"topf","pid":27193,"level":30,"time":"2021-02-23T08:24:52.858Z","msg":"SQL channel job 50 (createOrUpdateItem) succeeded in 17ms","v":0}
{"name":"log","hostname":"topf","pid":27193,"level":30,"time":"2021-02-23T08:24:52.890Z","msg":"SQL channel job 51 (updateConversations) succeeded in 16ms","v":0}
{"name":"log","hostname":"topf","pid":27193,"level":30,"time":"2021-02-23T08:24:53.353Z","msg":"Not updating notifications; notification status is noNotifications. ","v":0}
{"name":"log","hostname":"topf","pid":27193,"level":30,"time":"2021-02-23T08:24:59.614Z","msg":"Removing notification","v":0}
{"name":"log","hostname":"topf","pid":27193,"level":30,"time":"2021-02-23T08:25:00.614Z","msg":"Not updating notifications; notification status is noNotifications. ","v":0}

What I don't see are any 'Quitting' log entries that would indicate a regular exit.

Also, as long as signal is only 'in-tray', the application seems to survive for longer. However, if the main window is up at the time the terminal goes away, signal will crash within 1-2 seconds (not immediately).

Steps to Reproduce

  1. start signal desktop from the command line and put it into background
  2. close that terminal (not by window-kill, but regular terminal-logout)
  3. wait a few seconds for signal-desktop to crash.

Actual Result:

signal-desktop crashes.

Expected Result:

signal-desktop should just continue running

Screenshots

Does not apply.

Platform Info

Signal Version:

1.40.1

It also didn't work with 1.40.0, and this is a recent regression. However, I cannot tell which was the last version that worked for me.

Operating System: Debian Buster (up2date)

Linked Device Version:

Link to Debug Log

knarrff commented 3 years ago

Thinking about how to debug this I did this: start signal as indicated above, attach (in another terminal) gdb to it (pid 762), close the terminal signal was started from, watch what happens:

[Thread 0x7f7a00e9f700 (LWP 856) exited]
[Thread 0x7f7a11c58700 (LWP 830) exited]
[Thread 0x7f7a11457700 (LWP 831) exited]
[Thread 0x7f7a12459700 (LWP 829) exited]
[Thread 0x7f7a12c5a700 (LWP 828) exited]
[Thread 0x7f79c492b700 (LWP 939) exited]
[Thread 0x7f79c512c700 (LWP 938) exited]
[Thread 0x7f79c592d700 (LWP 937) exited]
[Thread 0x7f79cce5b700 (LWP 936) exited]
[Thread 0x7f79db8b7700 (LWP 934) exited]
[Thread 0x7f79dc0b8700 (LWP 933) exited]
[Thread 0x7f79dc8b9700 (LWP 932) exited]
[Thread 0x7f79dd0ba700 (LWP 931) exited]
[Thread 0x7f79e45e8700 (LWP 930) exited]
[Thread 0x7f79f2843700 (LWP 929) exited]
[Thread 0x7f7a0866d700 (LWP 854) exited]
[Thread 0x7f7a08e6e700 (LWP 853) exited]
[Thread 0x7f7a10455700 (LWP 844) exited]
[Thread 0x7f7a10c56700 (LWP 832) exited]
[Thread 0x7f7a1345b700 (LWP 825) exited]
[Thread 0x7f7a13c5c700 (LWP 824) exited]
[Thread 0x7f7a1445d700 (LWP 823) exited]
[Thread 0x7f7a14cfe700 (LWP 822) exited]
[Thread 0x7f7a154ff700 (LWP 821) exited]
[Thread 0x7f7a15d00700 (LWP 820) exited]
[Thread 0x7f7a16501700 (LWP 819) exited]
[Thread 0x7f7a16d02700 (LWP 818) exited]
[Thread 0x7f7a17503700 (LWP 817) exited]
[Thread 0x7f7a1942d700 (LWP 816) exited]
[Thread 0x7f7a18234700 (LWP 815) exited]
[Thread 0x7f7a18a35700 (LWP 813) exited]
[Thread 0x7f7a19236700 (LWP 812) exited]
[Thread 0x7f7a2085b700 (LWP 809) exited]
[Thread 0x7f7a2105c700 (LWP 808) exited]
[Thread 0x7f7a2185d700 (LWP 807) exited]
[Thread 0x7f7a2205e700 (LWP 806) exited]
[Thread 0x7f7a2285f700 (LWP 805) exited]
[Thread 0x7f7a232b4700 (LWP 804) exited]
[Thread 0x7f7a23ab5700 (LWP 803) exited]
[Thread 0x7f7a242b6700 (LWP 802) exited]
[Thread 0x7f7a24ab7700 (LWP 801) exited]
[Thread 0x7f7a252b8700 (LWP 800) exited]
[Thread 0x7f7a25ab9700 (LWP 797) exited]
[Thread 0x7f7a267aad80 (LWP 762) exited]
[Inferior 1 (process 762) exited with code 07]

Maybe that code 07 helps indicating what the problem is, but also the fact that there doesn't seem to be something like a stack to inspect due to the code exiting. Also, all but the first 'Thread ... exited' lines appear immediately after/while signal quits.

Doing the same with strace shows:

read(12, "!", 2)                        = 1
futex(0x558fdedd1304, FUTEX_WAKE_PRIVATE, 1) = 1
mprotect(0x128a00204000, 241664, PROT_READ|PROT_WRITE) = 0
mprotect(0x128a00204000, 241664, PROT_READ|PROT_EXEC) = 0
madvise(0x1938fe2e1000, 40960, MADV_DONTNEED) = 0
write(29, "\1\0\0\0\0\0\0\0", 8)        = 8
write(42, "Unhandled Error: Error [ERR_STRE"..., 767) = -1 EIO (Input/output error)
write(29, "\1\0\0\0\0\0\0\0", 8)        = 8
madvise(0x1938fe2e1000, 139264, MADV_DONTNEED) = 0
ioctl(2, TCGETS, 0x7ffe83dcced0)        = -1 EIO (Input/output error)
fstat(2, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x5), ...}) = 0
write(2, "events.js:292\n      throw er; //"..., 1401) = -1 EIO (Input/output error)
futex(0x7f3f3179bf64, FUTEX_WAKE_PRIVATE, 2147483647) = 0
madvise(0x1938ff2e0000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2de000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2dc000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2da000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2d7000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2d4000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2d2000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2cf000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2cd000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2ca000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2c8000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2c6000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2c4000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2c1000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2be000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2bc000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2b9000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff2b7000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff26e000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff26c000, 4096, MADV_DONTNEED) = 0
madvise(0x1938ff269000, 4096, MADV_DONTNEED) = 0
madvise(0x1938fcc62000, 12288, MADV_DONTNEED) = -1 EINVAL (Invalid argument)
madvise(0x1938fc6a7000, 24576, MADV_DONTNEED) = 0
madvise(0x1938ff303000, 28672, MADV_DONTNEED) = 0
madvise(0x1938fcd25000, 32768, MADV_DONTNEED) = 0
madvise(0x1938fc5bd000, 36864, MADV_DONTNEED) = 0
madvise(0x1938fc5b1000, 20480, MADV_DONTNEED) = 0
close(25)                               = 0
close(26)                               = 0
futex(0x558fdedd1304, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x558fdedd12b0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f3f2a9579d0, FUTEX_WAIT, 3604, NULL) = 0
munmap(0x7f3f1839c000, 8392704)         = 0
exit_group(7)                           = ?
hiqua commented 3 years ago

Just use nohup: https://en.wikipedia.org/wiki/Nohup

knarrff commented 3 years ago

The shell (bash) isn't sending SIGHUP, at least:

$ shopt | grep huponexit
huponexit       off

Also, this didn't show any signal (3118 is the main signal process):

$ strace -e signal -p3118
strace: Process 3118 attached
+++ exited with 7 +++

This is what I would expect: a job in the background (like signal in this case) should not receive SIGHUP in case the parent shell exits cleanly, with huponexit off.

nohup does seem to be a workaround, but I don't consider it a solution:

And indeed: when I start signal-desktop using

signal-desktop --start-in-tray --no-sandbox >/dev/null &

(redirecting stdout to /dev/null) and then exit the shell cleanly, signal-desktop keeps running. This indicates that the problem isn't a signal being sent, but signal-desktop trying to write to stdout when it's not available and possibly not handling the resulting exception correctly.

hiqua commented 3 years ago

Ok makes sense. In theory you could git bisect to find the offending commit.

I was wondering if it could be related to Electron 11, but it seems that element-desktop works fine if I close its associated terminal.

osctobe commented 3 years ago

This is what the app shows when the stdout TTY is closed (eg. when starting signal-desktop & from xterm and then closing the terminal).

Unhandled Error

Error: write EIO
    at afterWriteDispatched (internal/stream_base_commons.js:156:25)
    at writeGeneric (internal/stream_base_commons.js:147:3)
    at WriteStream.Socket._writeGeneric (net.js:785:11)
    at WriteStream.Socket._write (net.js:797:8)
    at writeOrBuffer (internal/streams/writable.js:358:12)
    at WriteStream.Writable.write (internal/streams/writable.js:303:10)
    at Object.write ([REDACTED]/node_modules/pino-multi-stream/multistream.js:68:18)
    at Pino.write ([REDACTED]/node_modules/pino/lib/proto.js:177:15)
    at Pino.LOG [as info] ([REDACTED]/node_modules/pino/lib/tools.js:57:21)
    at console.logAtLevel ([REDACTED]/ts/logging/main_process_logging.js:248:34)
busywhistling commented 3 years ago

At the moment, using nohup along with the ampersand & seems to work well. For example, below is a command (in fish shell) I use which doesn't produce errors on quitting:

nohup signal-desktop --start-in-tray --no-sandbox --enable-features=UseOzonePlatform --ozone-platform=wayland > /tmp/signal-desktop.log &; disown

I suppose you could just use the & at the end without the disown for bash shell (the ozone platform options are to run the app in wayland).

zatricky commented 2 years ago

I'm seeing a different/opposite (but probably related) issue. I too start signal from commandline, though I would normally leave that terminal open "forever". It seems signal has recently started detaching itself from the parent process shortly after startup.

This implies also that the ampersand & @knarrff was using would no longer be necessary - but this is probably not the behaviour we actually want.

As a result of this change in behaviour, aside from a crash during startup, the terminal can longer be used for debugging.

Is there is a known flag/parameter for this? I see #3518 re commandline documentation is still a long way from completion.

haarp commented 2 years ago

I'm also seeing crashes/freezing/random breakage when stdout is closed. Adding & disown to the command, then closing the terminal is sufficient to trigger this after a while.

lovesegfault commented 2 years ago

I recently encountered this too, the workaround is to pipe stdout and stderr to /dev/null before disown:

$ signal-desktop &> /dev/null & disown
stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

haarp commented 2 years ago

Not stale, still happening in 5.50.0.

Stale bots are bad and discourage contribution.

HaraldKi commented 1 year ago

This still happens with an Ubuntu snap that shows version "6.10.1 production" with a release notes window showing March 16, 2023.

bouil commented 9 months ago

Happening now on Deb package from official repository on Ubuntu, version: 6.47.0. Same crash with flatpak version 6.47.0 available from Flathub.

edwloef commented 2 months ago

This is still happening as of version 7.23.0, where it crashes and then gets stuck in a loop of these two errors: image image after running signal-desktop --start-in-tray & disown and waiting a bit. Starting Signal via a systemd unit seems to work fine though.

"Copy error and quit" unfortunately doesn't actually copy the error so I can only provide screenshots.

npt commented 1 month ago

This is happening to me, AFAICT, in 7.29.0 — the same error message eventually happens after disowning and closing the terminal, sometimes happens when starting from the application menu instead of the terminal, and doesn't happen when stdout/stderr are redirected to /dev/null. (I'm using the .deb on Kubuntu 24.04.)

indutny-signal commented 1 day ago

Thank y'all for your patience. This should be fixed in the next beta release and a prod release around one week later.