signalapp / Signal-Desktop

A private messenger for Windows, macOS, and Linux.
https://signal.org/download
GNU Affero General Public License v3.0
14.51k stars 2.63k forks source link

signal-desktop hangs after ca. 1 day of running and fails to send or receive messages #6577

Open iridos opened 1 year ago

iridos commented 1 year ago

Bug Description

signal-desktop hangs after a day or so, usually the next day after leaving it over night.

Steps to Reproduce

System: Debian/openbox/xfce4-panel

  1. run signal-desktop
  2. wait 1-2 days
  3. try to send or receive messages
  4. when a message is sent from signal-desktop, the device it is linked to doesn't recieve the message either

Actual Result:

After that time, no more messages are received and trying to send messages stops at an incomplete first circle with a dashed border - see screenshot.

Expected Result:

sends/receives messages

Screenshots

Screenshot: signal

Platform Info

Signal-desktop Version:

6.26.0 production

ii signal-desktop 6.26.0 amd64

Operating System: Debian Linux 12.0 and 11.0 (I have seen the same behavior on two different systems with different Debian versions)

Linked Device Version:

6.28.6

Link to Debug Log

https://debuglogs.org/desktop/6.26.0/d4e9a9f3bb7f663cf4d024deb731126fe367bb4cbffe5a3999efef79bc35bba3.gz

Device debug log:

https://debuglogs.org/android/6.28.6/5bae01c3493ddf84e176887d1e285c693ca6271877289f0535338cf38dbb4af4

iridos commented 1 year ago

I restarted yesterday… messages written out on starting terminal since yesterday (excerpt):

{"level":30,"time":"2023-08-16T13:34:24.032Z","msg":"System tray service: setting unread count to 0"}
{"level":30,"time":"2023-08-16T13:34:24.032Z","msg":"System tray service: rendering no tray"}
{"level":30,"time":"2023-08-16T13:36:41.182Z","msg":"System tray service: setting unread count to 1"}
{"level":30,"time":"2023-08-16T13:36:41.182Z","msg":"System tray service: rendering no tray"}
{"level":30,"time":"2023-08-16T13:36:45.935Z","msg":"System tray service: setting unread count to 0"}
{"level":30,"time":"2023-08-16T13:36:45.935Z","msg":"System tray service: rendering no tray"}
{"level":30,"time":"2023-08-16T15:20:03.545Z","msg":"Updating BrowserWindow config: %s {\"maximized\":false,\"autoHideMenuBar\":false,\"fullscreen\":false,\"width\":1014,\"height\":696,\"x\":2579,\"y\":47}"}
{"level":30,"time":"2023-08-16T15:20:03.545Z","msg":"config/set: Saving ephemeral config to disk"}
{"level":30,"time":"2023-08-16T15:20:03.546Z","msg":"config/set: Saved ephemeral config to disk"}
{"level":50,"time":"2023-08-17T07:53:28.485Z","msg":"Error occurred in handler for 'net.resolveHost': {}"}
{"level":50,"time":"2023-08-17T07:53:28.485Z","msg":"Error occurred in handler for 'net.resolveHost': {}"}
{"level":50,"time":"2023-08-17T07:53:28.485Z","msg":"Error occurred in handler for 'net.resolveHost': {}"}
iridos commented 1 year ago

any ideas? even for a workaround? right now I have to restart daily and have no idea what is happening

knarrff commented 1 year ago

I now see the same. Also on a Debian (bookworm) system, with signal version 6.29.1 (which I believe is the latest Debian package available).

trevor-signal commented 1 year ago

@knarrff can you provide a debug log?

knarrff commented 1 year ago

@knarrff can you provide a debug log?

Next time it happens. I'll likely have to wait for a day or so.

knarrff commented 1 year ago

https://debuglogs.org/desktop/6.29.1/7818872c4c13ea491b91428ec7c26f327679df0b49bba7f574624fb8c8120c12.gz

You probably know better what to look for. Just a few things that might help:

WARN  2023-09-07T05:28:51.619Z WebSocketResource(authenticated): Socket closed
INFO  2023-09-07T05:29:23.941Z UpdateKeysListener: Next update scheduled for 2023-09-07T20:43:25.525Z 
INFO  2023-09-07T05:30:23.941Z UpdateKeysListener: Next update scheduled for 2023-09-07T20:43:25.525Z
INFO  2023-09-07T05:31:23.941Z UpdateKeysListener: Next update scheduled for 2023-09-07T20:43:25.525Z
INFO  2023-09-07T05:32:23.941Z UpdateKeysListener: Next update scheduled for 2023-09-07T20:43:25.525Z
INFO  2023-09-07T05:33:23.941Z UpdateKeysListener: Next update scheduled for 2023-09-07T20:43:25.525Z
INFO  2023-09-07T05:33:23.948Z RetryPlaceholders.getExpiredAndRemove: Found 0 expired items
INFO  2023-09-07T05:33:23.956Z retryPlaceholders/interval: Found 0 expired items
INFO  2023-09-07T05:34:23.941Z UpdateKeysListener: Next update scheduled for 2023-09-07T20:43:25.525Z
INFO  2023-09-07T05:35:23.941Z UpdateKeysListener: Next update scheduled for 2023-09-07T20:43:25.525Z
INFO  2023-09-07T05:36:23.941Z routineProfileRefresh/2: starting
INFO  2023-09-07T05:36:23.942Z routineProfileRefresh/2: updating last refresh time
INFO  2023-09-07T05:36:23.942Z UpdateKeysListener: Next update scheduled for 2023-09-07T20:43:25.525Z 
INFO  2023-09-07T05:36:23.960Z routineProfileRefresh/2: starting to refresh conversations
INFO  2023-09-07T05:36:23.962Z routineProfileRefresh/2: refreshing profile for [REDACTED]36c ([REDACTED]022)
INFO  2023-09-07T05:36:23.962Z getProfile: getting unversioned profile for conversation [REDACTED]36c ([REDACTED]022)
INFO  2023-09-07T05:36:23.962Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]36c
INFO  2023-09-07T05:36:23.962Z Cycling agent for type undefined-auth
INFO  2023-09-07T05:36:23.962Z routineProfileRefresh/2: refreshing profile for [REDACTED]651 ([REDACTED]64a)
INFO  2023-09-07T05:36:23.963Z getProfile: getting unversioned profile for conversation [REDACTED]651 ([REDACTED]64a)
INFO  2023-09-07T05:36:23.963Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]651 
INFO  2023-09-07T05:36:23.963Z routineProfileRefresh/2: refreshing profile for [REDACTED]111 ([REDACTED]f2a)
INFO  2023-09-07T05:36:23.963Z routineProfileRefresh/2: refreshing profile for [REDACTED]71d ([REDACTED]705)
INFO  2023-09-07T05:36:23.963Z getProfile: getting unversioned profile for conversation [REDACTED]71d ([REDACTED]705)
INFO  2023-09-07T05:36:23.963Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]71d
INFO  2023-09-07T05:36:23.963Z routineProfileRefresh/2: refreshing profile for [REDACTED]0e2 ([REDACTED]59e)
ERROR 2023-09-07T05:36:23.963Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]36c 0 Error
ERROR 2023-09-07T05:36:23.963Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]651 0 Error
INFO  2023-09-07T05:36:23.963Z routineProfileRefresh/2: refreshed profile for [REDACTED]111 ([REDACTED]f2a)
ERROR 2023-09-07T05:36:23.963Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]71d 0 Error
INFO  2023-09-07T05:36:23.963Z routineProfileRefresh/2: refreshing profile for [REDACTED]eb2 ([REDACTED]d15)
INFO  2023-09-07T05:36:25.941Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]36c
ERROR 2023-09-07T05:36:25.942Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]36c 0 Error
INFO  2023-09-07T05:36:25.942Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]651
ERROR 2023-09-07T05:36:25.943Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]651 0 Error
INFO  2023-09-07T05:36:25.943Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]71d
ERROR 2023-09-07T05:36:25.943Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]71d 0 Error
INFO  2023-09-07T05:36:27.941Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]36c
ERROR 2023-09-07T05:36:27.942Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]36c 0 Error
WARN  2023-09-07T05:36:27.942Z getProfile failure: [REDACTED]36c ([REDACTED]022) code: -1

After that network connections seem to be recovering, but notifications stay off.

scottnonnenberg-signal commented 1 year ago

@knarrff Your log also has these net.resolveHost errors. There's more specific data in some of those log lines - can you connect those ERR_NETWORK_CHANGED events to anything that happened on your computer?

ERROR 2023-09-04T04:15:36.059Z Error occurred in handler for 'net.resolveHost': {}
ERROR 2023-09-04T04:15:36.059Z Error occurred in handler for 'net.resolveHost': {}
ERROR 2023-09-06T13:57:03.540Z Error occurred in handler for 'net.resolveHost': {}
WARN  2023-09-06T13:57:03.559Z SocketManager: authenticated socket connection failed with error: HTTPError: connectResource: connectFailed; code: -1
  Caused by: Error: Error invoking remote method 'net.resolveHost': Error: net::ERR_NETWORK_CHANGED
ERROR 2023-09-06T13:57:03.619Z Top-level unhandled promise rejection: HTTPError: connectResource: connectFailed; code: -1
  Caused by: Error: Error invoking remote method 'net.resolveHost': Error: net::ERR_NETWORK_CHANGED
ERROR 2023-09-06T13:57:03.619Z Top-level unhandled promise rejection: HTTPError: connectResource: connectFailed; code: -1
  Caused by: Error: Error invoking remote method 'net.resolveHost': Error: net::ERR_NETWORK_CHANGED
ERROR 2023-09-08T08:01:56.958Z Error occurred in handler for 'net.resolveHost': {}
iridos commented 1 year ago

Hmm, I do have the Error occurred in handler for 'net.resolveHost': {} messages, but nothing around them that seems connected.

I had tried killing the network thread of signal-desktop alone, which caused it to be restarted, but that did not recover full functionality.

scottnonnenberg-signal commented 1 year ago

@iridos @knarrff What can you tell us about your network setup?

iridos commented 1 year ago

This machine has a wired connection set up via network manager connected to a university network. I don't do suspend/resumes here.

$ nmcli
enp0s25: connected to Wired connection 1
        "Intel 82579LM"
        ethernet (e1000e), 00:19:99:EB:37:0A, hw, mtu 1500
        ip4 default, ip6 default
        inet4 134.60.2.xxx/24
        route4 134.60.2.0/24 metric 100
        route4 default via 134.60.2.1 metric 100
        inet6 2001:7c0:3101:a04:ea77:cac2:c4c8:xxxx/64
        inet6 2001:7c0:3101:a04:6003:b63d:fc2d:xxxx/64
        inet6 2001:7c0:3101:a04:e94f:fc35:ff09:xxxx/64
        inet6 2001:7c0:3101:a04:dcf6:19af:aa1b:xxxx/64
        inet6 2001:7c0:3101:a04:b8a6:c4a3:e5c1:xxxx/64
        inet6 2001:7c0:3101:a04:b809:8fbe:164:xxxx/64
        inet6 2001:7c0:3101:a04:e1aa:8da8:c313:xxxx/64
        inet6 2001:7c0:3101:a04:219:99ff:feeb:xxxx/64
        inet6 fe80::219:99ff:feeb:xxxx/64
        route6 2001:7c0:3101:a04::/64 metric 100
        route6 fe80::/64 metric 1024
        route6 default via fe80::1 metric 100

lo: connected (externally) to lo
        "lo"
        loopback (unknown), 00:00:00:00:00:00, sw, mtu 65536
        inet4 127.0.0.1/8
        inet6 ::1/128
        route6 ::1/128 metric 256

DNS configuration:
        servers: 134.60.1.111
        interface: enp0s25

        servers: 2001:7c0:3100::111 2001:7c0:3100:1::111
        interface: enp0s25

Edit: I have disabled ipv6 for now and restarted after the next crash

knarrff commented 12 months ago

@iridos @knarrff What can you tell us about your network setup?

Setup is as flexible as a typical laptop: partially wired ethernet and partially wifi. In case it is important: some of the networks it is connected to do support IPv6, some do not. Sometimes I intentionally disable ipv6 using /proc/sys/net/ipv6/conf/all/disable_ipv6. There is multiple wired ethernet and multiple wifi networks it regularly connects to. How the laptop is connected can change right after resume, but also right in the middle of normal operation.

network-manager deals with that, version 1.42.4-1 currently, on a Debian stable machine.

iridos commented 12 months ago

I had disabled ipv6 and it didn't make any difference. My network config is then very simple and short (just the ipv4 part of what I had shown)

Frank Löffler @.***> schrieb am Di., 19. Sept. 2023, 06:46:

@iridos https://github.com/iridos @knarrff https://github.com/knarrff What can you tell us about your network setup?

Setup is as flexible as a typical laptop: partially wired ethernet and partially wifi. In case it is important: some of the networks it is connected to do support IPv6, some do not. There is multiple wired ethernet and multiple wifi networks it regularly connects to. How the laptop is connected can change right after resume, but also right in the middle of normal operation.

— Reply to this email directly, view it on GitHub https://github.com/signalapp/Signal-Desktop/issues/6577#issuecomment-1724824805, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS57LNDXMOC3CAOGT5AQD3X3EPRRANCNFSM6AAAAAA3SJSRA4 . You are receiving this because you were mentioned.Message ID: @.***>

iridos commented 12 months ago

Hmm. Actually, I just noticed something after unlocking the screensaver - whatsapp web and signal-desktop both showed a "reconnecting to network" message. But I don't understand why that is. Network definitely isn't available only during the times when my desktop isn't locked by a screensaver - log in via ssh from home in the evening or morning all the time with no issue.

I think the issue wasn't triggered at that time with the visual "reconnecting" status but the next time after I had locked the screen. But there were no "could not resolve host" messages this time.

Now I wonder: why did signal-desktop have to reconnect at the time. The screensaver forcibly grabs all input - could this block something?

The screensaver used here is light-locker (default for the installed DE/WM). Maybe it depends on the screen-saver used. I just killed light-locker and started xscreensaver. We will see within the next week if that makes the problem disappear.

Best, Karsten

knarrff commented 12 months ago

Maybe just a red herring, but I also use light-locker (also default here).

iridos commented 11 months ago

That seems to have made a difference. signal-desktop can still send&receive messages after the weekend, which is a longer time than what I can remember seeing without a hang.

There is now another problem that most emotes can't be shown after the weekend (I think all emotes that I hadnt't recently used), but no idea if that h is a different manifestation of the same thing.

scottnonnenberg-signal commented 11 months ago

@iridos Emoji not loading often means that the app was updated out from under you, so on-disk references no longer work. Also causes crashes if you try to click a link. Do you have automatic apt updates configured?

iridos commented 11 months ago

Hi,

another day without it hanging. I think removing light-locker was a change that made the problem disappear for me. Maybe you can try to repoduce it yourself now by using light-locker?

light-locker has an option --idle-hint/--no-idle-hint. I guess it sets that while it is active by default. xscreensaver does not seem to do that (at least the man-page does not mention it), so that is a possible pointer. That seems to happen here https://github.com/the-cavalry/light-locker/blob/7587b53954a4d1c41a76d178d5e11ebb59eba922/src/gs-listener-dbus.c#L355 and be a call to dbus_message_new_method_call.

@scottnonnenberg-signal I do. And it restarts services as needed, but of course, signal-desktop is not a service. Good explanation. But I think that is not a possible cause for the message-hangs I had experienced before but is unrelated, as the hangs could happen several times a day and automatic updates don't happen with that frequency.

iridos commented 11 months ago

So, I have been away for the last couple of days - back and signal-desktop was still running without problems. I switched back to light-locker yesterday - and had to enable dpms and screen blanking via xorg with something like "xset s 300".

So after doing this yesterday, today signal-desktop hangs again like it did before.

Without screen blanking by xorg, light-locker wouldn't lock, except via command or key-stroke. I locked like that 2-3 times and signal-desktop kept running (but those few times are not enough for proof)

Have you already tried reproducing the error?

iridos commented 11 months ago

Any news? Could you reproduce by now?

iridos commented 10 months ago

I think the trigger is described quite closely now. Do you need further help to track the problem?

mzguy commented 10 months ago

I'm having this issue. I'm not sure what the trigger is, since I'm not using light-locker.

I also don't suspend. Simply use Ubuntu stock lock screen.

indutny-signal commented 10 months ago

@mzguy could you submit and quote the debug log here when the issue happens again, please?

iridos commented 9 months ago

Hi, is this fixed? What caused the hangs? Cheers, I.

mzguy commented 8 months ago

It's not fixed. I just submitted an issue to Signal with a debug log.

iridos commented 8 months ago

@mzguy I don't think it's light-locker per se… locking and grabbing kbd is what xscreensaver also does. light-locker doesn't blank itself, but leaves that to X. It may also tell dbus or something that the screen is locked and interactive processes can go to sleep. And this is what I suspect happens with signal. Some bits of it are getting paused and then some stuff gets out of sync and it can't recover from that.

Also… I think it's just a reflex to ask for the debug information. I looked at the debug information and I don't see a clue in it as to what's happening. Also several people already submitted debug information how is one or several more going to help

iridos commented 4 months ago

Any news on this?
@indutny-signal - care to comment what has been completed?

mzguy commented 4 months ago

I was also going to check on this today and got the notification of a new post!

I have to kill the Signal processes daily and restart them. If I don't notice, I type and try to send a message which often gets lost if I don't notice it's not going out.

indutny-signal commented 4 months ago

Sorry about this. I know it might seem like a reflex, but it is hard to be sure we know what we are looking at without debug log. Could you still submit one right after reproducing the issue? Thank you!

iridos commented 3 months ago

So as an update from me: after switching away from light-locker, I have not seen any hangs over months now.

After switching back to light-locker, I am still seeing the problem after having run signal-desktop in the background for 2 days.

Has anyone tried reproducing this using light-locker?

Light-locker does seem to not do so much itself, but let the X11 server do the blanking/powersaving. I could have done some more tests to narrow the actual cause down, but … well… some suggestions would have been nice and also to know this goes towards a fix.

And yeah, it seems like a reflex and I do wonder what having YADD (yet another debug dump) is going to tell you that the previous debug dumps have not told you. Sure, it might be a different problem, but as the one I reported nearly a year ago remains unfixed, that point seems pretty moot.

Here's some more debug info, now from signal-desktop 7.9.0 (messages on connecting to pid omitted):

$ ps a -o pid,cmd | grep signa[l]-d | cut -b 1-100
 815831 /opt/Signal/signal-desktop
 815835 /opt/Signal/signal-desktop --type=zygote --no-zygote-sandbox
 815836 /opt/Signal/signal-desktop --type=zygote
 815838 /opt/Signal/signal-desktop --type=zygote
 815872 /opt/Signal/signal-desktop --type=gpu-process --enable-crash-reporter=6ce535c1-ac2e-4e0e-b20
 815878 /opt/Signal/signal-desktop --type=utility --utility-sub-type=network.mojom.NetworkService --
 815917 /opt/Signal/signal-desktop --type=renderer --enable-crash-reporter=6ce535c1-ac2e-4e0e-b205-5

gdb -p 815831
GNU gdb (Debian 13.1-3) 13.1

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f4bfddaf15f in __GI___poll (fds=0xfcc02f5f900, nfds=5, timeout=1194) at ../sysdeps/unix/sysv/linux/poll.c:29
29  ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) bt
#0  0x00007f4bfddaf15f in __GI___poll (fds=0xfcc02f5f900, nfds=5, timeout=1194) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f4bff11c9ae in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007f4bff11cacc in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x0000559e033df465 in  ()
#4  0x00000356601a234f in  ()
#5  0x0000000000001f40 in  ()
#6  0x000003566007ec8f in  ()
#7  0xaaaaaaaaaaaaaa00 in  ()
#8  0x00003d4400304350 in  ()
#9  0x00000000aaaaaa00 in  ()
#10 0xaaaaaa0100000000 in  ()
#11 0x0000000000000000 in  ()
$ gdb -p 815835
(gdb) bt
#0  0x00007f2c00d6e1f8 in __ppoll (fds=0x7ffd362efda8, nfds=1, timeout=<optimized out>, sigmask=0x7ffd362efc50) at ../sysdeps/unix/sysv/linux/ppoll.c:42
#1  0x000055b1a01512f8 in  ()
#2  0x00007ffd362efb40 in  ()
#3  0x000055b19cf121a0 in uv_tty_set_vterm_state ()
#4  0x0000000000000000 in  ()
$ gdb -p 815836
(gdb) bt
#0  0x00007f8a13869bc6 in __waitid (idtype=P_ALL, id=0, infop=0x7ffddbbb63a0, options=4) at ../sysdeps/unix/sysv/linux/waitid.c:29
#1  0x000055d8b8817267 in  ()
#2  0xaaaaaaaaaaaaaaaa in  ()
#3  0x000055d8b37491a0 in uv_tty_set_vterm_state ()
#4  0x0000000000000000 in  ()
$ gdb -p 815838
(gdb) bt
#0  0x00007f8a138921f8 in __ppoll (fds=0x7ffddbbb6568, nfds=1, timeout=<optimized out>, sigmask=0x7ffddbbb6410) at ../sysdeps/unix/sysv/linux/ppoll.c:42
#1  0x000055d8b69882f8 in  ()
#2  0xaaaaaaaaaaaaaaaa in  ()
#3  0x000055d8b37491a0 in uv_tty_set_vterm_state ()
#4  0x0000000000000000 in  ()
$ gdb -p 815872
(gdb) bt
#0  0x00007f2c00d6e15f in __GI___poll (fds=0x2ba4001197a0, nfds=3, timeout=4000) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f2c0211c9ae in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007f2c0211cacc in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x000055b1a0864465 in  ()
#4  0x000003567427d145 in  ()
#5  0x0000000000001f40 in  ()
#6  0x0000035673eac862 in  ()
#7  0xaaaaaaaaaaaaaa00 in  ()
#8  0x00002ba4000340d0 in  ()
#9  0x00000000aaaaaa00 in  ()
#10 0xaaaaaa0100000000 in  ()
#11 0x0000000000000000 in  ()
$ gdb -p 815917
(gdb) bt
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x7fff3fd71380, op=137, expected=0, futex_word=0x7fff3fd714a0) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common
    (futex_word=futex_word@entry=0x7fff3fd714a0, expected=expected@entry=0, clockid=clockid@entry=1, abstime=abstime@entry=0x7fff3fd71380, private=private@entry=0, cancel=cancel@entry=true)
    at ./nptl/futex-internal.c:87
#2  0x00007f56d3884efb in __GI___futex_abstimed_wait_cancelable64
    (futex_word=futex_word@entry=0x7fff3fd714a0, expected=expected@entry=0, clockid=clockid@entry=1, abstime=abstime@entry=0x7fff3fd71380, private=private@entry=0)
    at ./nptl/futex-internal.c:139
#3  0x00007f56d388783c in __pthread_cond_wait_common (abstime=0x7fff3fd71380, clockid=1, mutex=0x7fff3fd71450, cond=0x7fff3fd71478) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_timedwait64 (cond=0x7fff3fd71478, mutex=0x7fff3fd71450, abstime=0x7fff3fd71380) at ./nptl/pthread_cond_wait.c:643
#5  0x000055e7b3620398 in  ()
#6  0x000000000037ff86 in  ()
#7  0x0000000028a39834 in  ()
#8  0xaaaaaaaaaaaaaa00 in  ()
#9  0xaaaaaaaaaaaaaaaa in  ()
#10 0xaaaaaaaaaaaaaa00 in  ()
#11 0xaaaaaaaaaaaaaaaa in  ()
#12 0xaaaaaaaaaaaaaaaa in  ()
#13 0xaaaaaaaaaaaaaaaa in  ()
#14 0xaaaaaaaaaaaaaaaa in  ()
#15 0xaaaaaaaaaaaaaaaa in  ()
#16 0xaaaaaaaaaaaaaaaa in  ()
#17 0xaaaaaaaaaaaaaaaa in  ()
#18 0xaaaaaaaaaaaaaa00 in  ()
#19 0x00000ea40040c1d0 in  ()
#20 0x000000000037ff86 in  ()
#21 0x0000000017cbfac4 in  ()
#22 0x0000000000000000 in  ()
dirwiz commented 2 months ago

So as an update from me: after switching away from light-locker, I have not seen any hangs over months now.

After switching back to light-locker, I am still seeing the problem after having run signal-desktop in the background for 2 days.

@iridos Thanks for the tip. I removed light-locker from both a Debian & Mint distributions running XFCE & Lightdm. This definitely solved the problem for me. Hopefully the @indutny-signal will find this helpful in reproducing the problem.

mzguy commented 2 months ago

Sorry about this. I know it might seem like a reflex, but it is hard to be sure we know what we are looking at without debug log. Could you still submit one right after reproducing the issue? Thank you!

@indutny-signal I just noticed this. I did submit a debug log when asked. I'm running Ubuntu 20.04 LTE, very popular and vanilla setup. I haven't installed any other screensavers or anything like that.

Did you see my debug log? Can you reproduce or find the root cause of this issue yet?

jsn-0 commented 1 month ago

I've been dealing with this issue with Debian 12 / Xfce / light-locker. Locking a session causes browser websocket connections to drop as well as any sort of electron based apps that use websockets. Basically any X11 apps that have a persistent network connection lose connectivity. SSH connections and cli utilities started from a session are unaffected.

Firefox and Chromium recover fine. Other apps vary. Signal fails to re-establish it's connection (or out right becomes unresponsive and has to be killed). My current logs show Signal failing to re-establish a websocket since unlocking my computer.

{"level":30,"time":"2024-08-08T23:53:14.406Z","msg":"WebSocketResources.KeepAlive(unauthenticated).send: Sending a keepalive message"}
{"level":30,"time":"2024-08-08T23:53:36.234Z","msg":"WebSocketResources.KeepAlive(authenticated).send: Sending a keepalive message"}
{"level":30,"time":"2024-08-08T23:53:44.482Z","msg":"WebSocketResources.KeepAlive(unauthenticated).send: Sending a keepalive message"}
{"level":30,"time":"2024-08-08T23:54:06.322Z","msg":"WebSocketResources.KeepAlive(authenticated).send: Sending a keepalive message"}
{"level":30,"time":"2024-08-08T23:54:14.484Z","msg":"WebSocketResource(unauthenticated).close(3001)"}
{"level":40,"time":"2024-08-08T23:54:14.490Z","msg":"WebSocketResource(unauthenticated): Socket closed"}
{"level":40,"time":"2024-08-08T23:54:14.490Z","msg":"SocketManager: unauthenticated socket closed with code=3001 and reason=No response to keepalive request"}
{"level":30,"time":"2024-08-08T23:54:36.405Z","msg":"WebSocketResources.KeepAlive(authenticated).send: Sending a keepalive message"}
{"level":30,"time":"2024-08-08T23:54:44.497Z","msg":"WebSocketResources.KeepAlive(unauthenticated).send: Sending a keepalive message"}
{"level":30,"time":"2024-08-08T23:55:06.494Z","msg":"WebSocketResources.KeepAlive(authenticated).send: Sending a keepalive message"}
{"level":30,"time":"2024-08-08T23:55:14.498Z","msg":"WebSocketResource(unauthenticated).close: Already closed! 3001/No response to keepalive request"}
{"level":30,"time":"2024-08-08T23:55:36.590Z","msg":"WebSocketResources.KeepAlive(authenticated).send: Sending a keepalive message"}