signalapp / Signal-Desktop

A private messenger for Windows, macOS, and Linux.
https://signal.org/download
GNU Affero General Public License v3.0
14.17k stars 2.58k forks source link

Desktop Suspend/Resume Decouples Mobile Synchronization #6103

Open apiziali opened 1 year ago

apiziali commented 1 year ago

The recently closed bug report "[Linux: Signal doesn't reconnect after sleep or lock screen] #6027" looks quite similar to the issue I am seeing.


Bug Description

When I suspend and subsequently resume my Ubuntu 18.04 PC, the message queue of Signal Desktop no longer remains synchronized with my Signal Android. Every time I resume the PC, I need to restart Signal to see the latest messages already delivered to the mobile phone.

One other environment consideration is that this is a multi-user PC, with each of the two users logged in and running Signal. However, I believe I have seen the synchronization failure with only one active user session.

Steps to Reproduce

  1. Boot the Ubuntu 18.04 workstation ("elijah").
  2. Log into elijah.
  3. Signal Desktop is started by the Ubuntu Startup Applications, using the command: /usr/bin/signal-desktop --use-tray-icon --start-in-tray %U
  4. Double-click the desktop icon "Sleep Until Tonight." This invokes the shell program ~/bin/sleep_until_tonight. That program suspends the workstation using the command: rtcwake --mode mem --utc --time [WAKEUP_TIME], where "[WAKEUP_TIME]" is a valid time to resume, such as "1662094200." "rtcwake --mode mem" puts the workstation in ACPI state 3.
  5. Wait for a few Signal chats to be delivered to the mobile app.
  6. Resume elijah.
  7. Those new chats are absent in Signal Desktop.
  8. Quit Signal Desktop.
  9. Restart Signal Desktop.
  10. The chats delivered to the mobile app are now displayed in Signal Desktop.

Actual Result:

Chats that were delivered to the mobile app are not delivered to the desktop application, as described in "Steps to Reproduce."

Expected Result:

When the workstation is resumed, Signal Desktop ought to reflect the chats that were delivered while the workstation was suspended.

Screenshots

Platform Info

Signal Version:

5.56.0 ... Production.

Operating System:

Ubuntu 18.04.6 LTS (uname -a : "Linux elijah 5.4.0-124-generic #140~18.04.1-Ubuntu SMP Fri Aug 5 11:43:34 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux")

Linked Device Version:

Samsung Galaxy A21, Android 11, One UI Core 3.1.

Link to Debug Log

https://debuglogs.org/desktop/5.56.0/33574877e6a4c8cdbd74fec3f7f09269a6c0281d6bb5a25e0a0db48626841da5.gz https://debuglogs.org/android/5.47.3/83a1078f72b089d5d0f5996e170a937b768ab035e7ab9456d2141516ce61d8a4

debuglog.txt

timp34 commented 1 year ago

I have same issue on Debian 11 with Xfce. Screen lock will also cause same issue. Can provide debug log if required.

scottnonnenberg-signal commented 1 year ago

@apiziali So, at a high level I can say that when Desktop is not staying up to date with incoming messages after resume, it is in fact unable to access its database at all (we timeout after two minutes waiting to talk to our database). It makes me think that on your machine, inter-process communication (IPC, a core part of our Electron foundations) is broken. This is 100% reproducible for you? Can you tell us more about your linux install, window manager, etc.?

@timp34 Can we get a log from you?

apiziali commented 1 year ago

Scott,

I am not familiar with the implementation of Signal, so I cannot answer your question about its database. However, I can say that sometimes, but not always, messages that have been delivered to Signal Android on my phone are not delivered to Signal Desktop on my workstation after a resume.

These are the details of my Linux desktop:

 OS     : Ubuntu 18.04.6 LTS

 uname -a   : Linux elijah 5.4.0-124-generic #140~18.04.1-Ubuntu
          SMP Fri Aug 5 11:43:34 UTC 2022 x86_64 x86_64
      x86_64 GNU/Linux

 Window Mgr : Stock Gnome 3

 Workstation    : Dell Precision 3630 Tower Workstation

 CPU        : Intel Core i5-8500 CPU (6 core, 3.00GHz)

 Memory : 32 GB

I posted the following log files at the time I posted the bug report (September 1):

 https://debuglogs.org/desktop/5.56.0/33574877e6a4c8cdbd74fec3f7f09269a6c0281d6bb5a25e0a0db48626841da5.gz

 https://debuglogs.org/android/5.47.3/83a1078f72b089d5d0f5996e170a937b768ab035e7ab9456d2141516ce61d8a4

 debuglog.txt : https://github.com/signalapp/Signal-Desktop/files/9474416/debuglog.txt

How else may I assist?

Thanks!


On 9/14/22 9:28 AM, Scott Nonnenberg wrote:

@apiziali https://github.com/apiziali So, at a high level I can say that when Desktop is not staying up to date with incoming messages after resume, it is in fact unable to access its database at all (we timeout after two minutes waiting to talk to our database). It makes me think that on your machine, inter-process communication (IPC, a core part of our Electron foundations) is broken. This is 100% reproducible for you? Can you tell us more about your linux install, window manager, etc.?

@timp34 https://github.com/timp34 Can we get a log from you?

— Reply to this email directly, view it on GitHub https://github.com/signalapp/Signal-Desktop/issues/6103#issuecomment-1247017970, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGLF2LR4KKGUZFCZQCRXOI3V6H4LNANCNFSM6AAAAAAQC4E76I. You are receiving this because you were mentioned.Message ID: @.***>

-- | @.** ____------+------____ / \ ---*

 PGP Key: h t t p s : / / b i t . l y / A n d y P G P K e y

"Computers in the future may weigh no more than 1.5 tons." -- Popular Mechanics, forecasting the relentless march of science, 1949.

timp34 commented 1 year ago

Here is my log. This is whilst signal appears to be working ok. Will add another shortly when it is "not" working correctly... :) https://debuglogs.org/desktop/5.58.0/659cdb98ebc7cdf5e53a27ac6529ddb95440c79a642a1a4850517500cf6ffbfb.gz

apiziali commented 1 year ago

Rather than habitually restarting Signal Desktop each time I resume the workstation, I just resumed without restarting it. After sending a "Note to Self" from my phone, it did not appear on the desktop. I immediately captured a debug log from the desktop:

https://debuglogs.org/desktop/5.58.0/05e264ed0b151de8f23332850112e1dbfc9212fec1183b604e3ffebb133f090c.gz

timp34 commented 1 year ago

here is debug log for desktop that is now not sending or receiving after computer has been "suspended". Linked phone has new messages which are not on desktop version, hope this helps... https://debuglogs.org/desktop/5.58.0/616fb48cf70528e9e7a6d3b7cf68e4a50e917367c37a9aad34b79ec1f5b5989c.gz

apiziali commented 1 year ago

I just noticed another clue. When Signal Desktop does not reflect recent messages delivered to Signal Android, the Ubuntu notification system still displays the first line or two of new messages. If one of those summaries is clicked, a GSConnect window, "Messaging: Andy's Phone," opens. Its left pane displays "No Conversations" which its right pane displays "No conversation selected. Select or start a conversation." GSConnect notifications are enabled in the Gnome notification settings. However, in Tweaks, GSConnect is turned off.

This may just reflect that GSConnect is reporting new messages received by Signal Android, so I could disable the GSConnect notifications on the Ubuntu workstation.

apiziali commented 1 year ago

I've disabled GSConnect notifications on my workstation until we resolve this Signal Desktop synchronization failure.

apiziali commented 1 year ago

FYI, GSConnect (https://bit.ly/3BrRUIg) is a re-implementation of KDE Connect for Gnome, which provides phone/desktop communication and control.

indutny-signal commented 1 year ago

Very interesting! Thank you for investigating it. I'll file a bug internally, but would you care to submit this to Electron as well? It sounds like they might be interested in it too!

apiziali commented 1 year ago

It looks to me as though GSConnect reporting new Signal Android messages, even though GSConnect is disabled in Gnome, is a red herring with regard to Signal Desktop and Signal Android losing synchronization. We need to understand why synchronization is lost.

scottnonnenberg-signal commented 1 year ago

@apiziali @timp34 Hey there! We now have an Electron Fiddle ready for you to test.

  1. First, download Electron Fiddle here: https://www.electronjs.org/fiddle
  2. Then load up this Gist inside Electron Fiddle: https://gist.github.com/scottnonnenberg-signal/fc5174e6015c61fdae49ffe767d441bc
  3. Then start it up on a recent version of Electron. Currently Signal Desktop is on 20.1.4.
  4. Open the dev tools with View -> Toggle Developer Tools and watch the IPC ping/pong events go by.
  5. Suspend your linux machine, then resume it. See if the ping/pong events continue, or just ping with no pong. If just ping then IPC has broken, just like it has with your Signal Desktop.

You can try lots of different versions of Electron really easily within Electron Fiddle, and you can also create packaged versions to see if that makes a difference too (Tasks -> Package Fiddle).

Please let us know what you find! This is the kind of thing that will give us really useful information to pass to Electron so they can get things fixed!

apiziali commented 1 year ago

Fantastic! Unfortunately, I am tied up for about ten days so I will pick this up around October 3.

Thank you!

On September 23, 2022 4:58:05 PM MST, Scott Nonnenberg @.***> wrote:

@apiziali @timp34 Hey there! We now have an Electron Fiddle ready for you to test.

  1. First, download Electron Fiddle here: https://www.electronjs.org/fiddle
  2. Then load up this Gist inside Electron Fiddle: https://gist.github.com/scottnonnenberg-signal/fc5174e6015c61fdae49ffe767d441bc
  3. Then start it up on a recent version of Electron. Currently Signal Desktop is on 20.1.4.
  4. Open the dev tools with View -> Toggle Developer Tools and watch the IPC ping/pong events go by.
  5. Suspend your linux machine, then resume it. See if the ping/pong events continue, or just ping with no pong. If just ping then IPC has broken, just like it has with your Signal Desktop.

You can try lots of different versions of Electron really easily within Electron Fiddle, and you can also create packaged versions to see if that makes a difference too (Tasks -> Package Fiddle).

Please let us know what you find! This is the kind of thing that will give us really useful information to pass to Electron so they can get things fixed!

-- Andrew Piziali @.***> h t t p s : / / b i t . l y / A n d y P G P K e y

"The Great Reset represents the development of the Chinese system in the West, but in reverse. Whereas the Chinese political class began with a socialist political system and then introduced privately held for-profit production, the West began with capitalism and is now implementing a Chinese-style political system." -- Michael Rectenwald, "What Is The Great Reset?," "Imprimis," December 2021

apiziali commented 1 year ago

Scott, following your instructions, I ran the test using Electron 20.1.4 and annotated the results in your message below. As you can see, without even introducing suspend/resume, no "pong" event are reported in the console:

=============== Hey there! We now have an Electron Fiddle ready for you to test.

First, download Electron Fiddle here: https://www.electronjs.org/fiddle -- Done

Then load up this Gist inside Electron Fiddle: https://gist.github.com/scottnonnenberg-signal/fc5174e6015c61fdae49ffe767d441bc -- Downloaded -- Unzipped

Then start it up on a recent version of Electron. Currently Signal Desktop is on 20.1.4. -- Started Electron Fiddle ("electron-fiddle >/tmp/ef.log 2>&1 &") -- Downloaded Electron version 20.1.4 from File > Preferences > Electron -- Changed the version of Electron to 20.1.4 -- Load the Gist: File > Open > /home/andy/Signal Suspend-Resume Failure/Gist/\ fc5174e6015c61fdae49ffe767d441bc-8ff8c17d37c693586c08ea16704a49ca259a23af -- Open the console -- Run the Gist: [Run]

Open the dev tools with View -> Toggle Developer Tools and watch the IPC ping/pong events go by. -- Only "pings" are logged in the console, even though no suspend/resume cycles have been initiated:

------------
Console ready
Saving files to temp directory...
Saved files to /tmp/tmp-18764-ZrmqHuvlUq40
Electron v20.2.0 started.
main ping #1
main ping #2
main ping #3
main ping #4
main ping #5
main ping #6
main ping #7
main ping #8
main ping #9
main ping #10
main ping #11
main ping #12
main ping #13
main ping #14
main ping #15
main ping #16
main ping #17
main ping #18
main ping #19
main ping #20
main ping #21
main ping #22
main ping #23
main ping #24
main ping #25
main ping #26
main ping #27
main ping #28
main ping #29
main ping #30
main ping #31
main ping #32
main ping #33
main ping #34

Electron exited with code 0.
------------

...

scottnonnenberg-signal commented 1 year ago

@apiziali Your pasted log output is from the main process, which isn't where you'll see the 'pong' result. You need to start up the app, and open the dev tools with View -> Toggle Developer Tools and you should see both 'ping' and 'pong.'

apiziali commented 1 year ago

Scott,

There are no “pings,” nor “pongs” logged in the developer console (see attached). Screenshot from 2022-10-05 14-29-17

I have not changed any of the default Electron Fiddle settings.

Also, "Toggle Developer Tools" is at: Help > Toggle Developer Tools

scottnonnenberg-signal commented 1 year ago

I think you need to run the Electron Fiddle and enable the dev tools in the running application, not the Electron Fiddle editor application.

apiziali commented 1 year ago

I have never used Electron Fiddle before, so I do not know how to enable the developer tools in the running application.

scottnonnenberg-signal commented 1 year ago

Open the View menu and click Toggle Developer Tools in the running application when you choose to Run the code with Electron Fiddle. You can also package the application up and run it as a standalone application, maybe more easily tested.

apiziali commented 1 year ago

Okay. I now realize that you are referring to View > Toggle Developer Tools in the application (Chrome?) menus, not in the Electron Fiddle menus. I've run four suspend/resume cycles, at 1-, 2-, 4- and 8-minute intervals between the suspend and resume. In each case, the ping/pong continues. What log files do you want me to capture, and how do I do so? Thanks!

scottnonnenberg-signal commented 1 year ago

@apiziali The best thing to do would be to do whatever seemed to cause the problem with Signal Desktop. We're looking to replicate that suspend/resume problem, which we believe would mean that the ping/pong would become just ping.

apiziali commented 1 year ago

Curiously, while running these tests, without restarting Signal, it continued to remain synchronized with Signal Android, as indicated by "Message to self" and a message sent from another user.

apiziali commented 1 year ago

Alright, I will just leave this application running as we use the workstation normally, suspending and resuming it throughout the day. Shall I just copy-and-paste the application console log when Signal Desktop loses synchronization with Signal Android?

scottnonnenberg-signal commented 1 year ago

@apiziali Certainly we're interested in whether this behavior still happens for you on Signal Desktop. The key bit of data we're looking for is whether, when it happens on Signal Desktop, it also happens on Electron Fiddle. We're looking for the minimum bit of code that can cause the weirdness on your system to trigger.

apiziali commented 1 year ago

Signal Desktop has not lost synchronization with Signal Android since I started running your Gist. I then stopped the Gist in Electron Fiddle, but continued using the workstation and Signal as usual. Still, Signal Desktop remains synchronized. I quit the Electron Fiddle application and will monitor the behavior of Signal Desktop during the next few days of multiple suspend/resume cycles.

This is a classic problem, right? You begin to investigate and diagnose an issue and it ceases to exist. Causal? Who knows?

scottnonnenberg-signal commented 1 year ago

@apiziali It would be preferable to us if you kept the electron fiddle app running in the background all the time along with Signal Desktop, just to gather as much data as possible.

apiziali commented 1 year ago

Okay. I'll start it up again and we'll see if Signal loses synchronization. If it does, we'll have the fiddle app log.

BTW, where will I find this log file?

scottnonnenberg-signal commented 1 year ago

@apiziali When Signa Desktop stops working, all we need is the Signal Desktop log and the yes/no of whether Electron Fiddle is doing the full ping/pong in its dev tools, or just ping.

apiziali commented 1 year ago

Good news! My wife attempted to use Signal Desktop on Oct. 7, 2022 at 1800 MST (UTC-0700) without success. The fiddle application was running. I attached the debug log from her running Signal ("Debbie Signal Debug Log.txt"). You might note from the initial post of this bug report that this is a multiuser workstation, two users always logged in, each running their own Signal Desktop.

In the fiddle app, after a sequence of ping/pongs from #1 to #243, there are only pings starting at #244. However, I believe that anomaly occurred long before this Signal malfunction. By #13440, the ping/pong sequence continues as normal. Between these times, at the #6000 range of fiddle app messages, there are only "pong" messages!

Lastly, I just verified that my wife's Signal Desktop is still not synchronized with her Signal Android at 1930 MST.

Debbie Signal Debug Log.txt

apiziali commented 1 year ago

Okay. I just saw Signal Desktop and Signal Android lose synchronization on my Ubuntu login. The fiddle app is still ping/pong-ing along, but a note-to-self from the phone was not delivered to the desktop. The Signal Desktop debug log ("Andy Signal Debug Log.txt") and Signal Android debug log ("Andy Signal Android Debug Log.zip") are attached.

  1. [Andy Signal Debug Log.txt](https://github.com/signalapp/Signal-Desktop/files/9740446/Andy.Signal.Debug.Log.txt]

  2. [Andy Signal Android Debug Log.zip](https://github.com/signalapp/Signal-Desktop/files/9740451/Andy.Signal.Android.Debug.Log.zip]

apiziali commented 1 year ago

Another desynchronization this afternoon. The fiddle app was continuously ping/pong-ing but Signal Desktop did not receive a note-to-self sent from Signal Android. The desktop and Android log files are attached.

Andy Signal Android Debug Log 221009.zip Andy Signal Desktop Debug Log 221009.txt

apiziali commented 1 year ago

This desynchronization was revealed by the failure of a note-to-self to be delivered from Signal Android to Signal Desktop. Again, the running fiddle app continued to display a continuous sequence of ping-pong events. The desktop and Android log files are attached.

Andy Signal Android Debug Log 221012.zip Andy Signal Desktop Debug Log 221012.txt

apiziali commented 1 year ago

@scottnonnenberg, I haven't heard from you in a week, after posting a number of failure cases and their associated log files. What's up?

apiziali commented 1 year ago

I'd like to know whether or not this issue is still on anybody's radar. Since October 14, 2022, I have resumed restarting Signal Desktop after each resume in order for it to remain synchronized with Signal Android.

jamiebuilds-signal commented 1 year ago

It is still on our radar, it was interrupted by other ongoing efforts, apologies for the delay

apiziali commented 1 year ago

Thank you Jamie. I will stand by, ready to assist, and continue restarting Signal Desktop after each workstation resume.

MikeWallaceDev commented 1 year ago

Hello, I'm still seeing this behavior on Manjaro/xfce. I'm on Signal Desktop 6.2.0

https://debuglogs.org/desktop/6.2.0/5722466fcd728b3c3276f8d0cc472a2087f0a72310a85b5ef50a1ffb58c33f70.gz

scottnonnenberg-signal commented 1 year ago

@apiziali @MikeWallaceDev @timp34 If you could provide detailed computer specs, that would help us enter a bug on Electron. Our current theory is that we're not getting powerMonitor suspend/resume events as expected, which would allow us to better handle tasks started right before your computer goes to sleep. Today, they throw all sorts of errors because they take too long across your sleep/resume cycle.

scottnonnenberg-signal commented 1 year ago

Update: We have other theories. We have at least one log on a recent build with suspend/resume log lines. We're gonna try to reproduce it ourselves. It would help us if you track it down - for example, does it only happen if you sleep for over 20 minutes, or does it happen if you manually sleep and come back in 5 minutes?

MikeWallaceDev commented 1 year ago

@scottnonnenberg-signal

Here is the log when I come back to my desk and the app isn't working correctly. https://debuglogs.org/desktop/6.2.0/e53cda133d148923bc382e0fcd0f66f6fe682af4f28ada644d335c1a6aecb3e5.gz

Here is the log after I restarted the app, and the messages are all visible. https://debuglogs.org/desktop/6.2.0/f33f10f78c31d95eda7fd1fc59880eb07d4d4f459f78003c7037311292e84c71.gz

I don't know about the 5 minute thing. Sorry.

scottnonnenberg-signal commented 1 year ago

@MikeWallaceDev It looks like your computer was asleep for a very short time here - do you remember what was going on there?

2023-01-28T04:03:12.904Z is Fri Jan 27 2023 20:03:12 GMT-0800 (Pacific Standard Time).

// A few bigger breaks between log lines
INFO  2023-01-28T04:03:12.904Z checkExpiringMessages: found no messages to expire
INFO  2023-01-28T04:03:31.865Z WebSocketResources: Sending a keepalive message
INFO  2023-01-28T04:03:42.127Z WebSocketResources: Sending a keepalive message
INFO  2023-01-28T04:03:57.529Z MessageReceiver: got request PUT /api/v1/message
...
// Then about two minutes later we have tasks blocked for just under two minutes
WARN  2023-01-28T04:05:14.660Z TaskWithTimeout: SQL channel call (getNextAttachmentDownloadJobs) has been running for 118956ms
WARN  2023-01-28T04:05:14.661Z TaskWithTimeout: SQL channel call (commitDecryptResult) has been running for 77050ms
INFO  2023-01-28T04:05:27.643Z WebSocketResources: Sending a keepalive message
MikeWallaceDev commented 1 year ago

Actually, it's been asleep since yesterday. (Or at least, I didn't purposely wake it. Maybe I bumped in to the desk) But as far as I know : asleep since last night.

The first thing I did when I sat down was take care of Signal, and reply to your message.

apiziali commented 1 year ago

Scott, you asked for detailed computer specs. Attached is the output of hwinfo (16,428 lines). apiziali_hwinfo.txt

apiziali commented 1 year ago

Scott,

I attached a full hwinfo(8) output to this issue (#6103) on Github.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= On 2/1/23 10:32 AM, Scott Nonnenberg wrote:

@apiziali @MikeWallaceDev @timp34 If you could provide detailed computer specs, that would help us enter a bug on Electron. Our current theory is that we're not getting powerMonitor suspend/resume events https://www.electronjs.org/docs/latest/api/power-monitor as expected, which would allow us to better handle tasks started right before your computer goes to sleep. Today, they throw all sorts of errors because they take too long across your sleep/resume cycle.

-- Andy Piziali @.***>

apiziali commented 1 year ago

This happens even on abbreviated suspends on my workstation.

megele commented 1 year ago

Just chiming in to report that I'm observing the same issue (i.e., this is not a singularity). Debian Bookworm, xfce, signal-desktop. Happy to provide whatever logs you tell me to gather.

megele commented 1 year ago

While it has been mentioned in conversation above, once signal-desktop is in this "state" it also cannot send messages anymore -- very reproducibly. So maybe there is some interesting log-gathering that can be done? As always ... Happy to provide whatever logs you tell me to gather.

scottnonnenberg-signal commented 1 year ago

@megele What we're looking for is a consistent repro so we can get it to happen, or the Electron team can get it to happen. When does it happen for you and when doesn't it?

megele commented 1 year ago

Sorry for the very late reply. I appreciate your help in figuring this out! signal-desktop remains functional if the machine only briefly goes to sleep (e.g., if I just close the laptop lid, wait for the led-indicator to blink slowly, saying that it's asleep, and open it up after a few seconds again). However, whenever I step away from the machine for a longer period of time, the machine goes to sleep and when I wake it back up, the described issue occurs. TBH, I'm not sure what the difference is between these two situations.

In terms of "consistent repro", it is consistent in the sense that provided sufficient "sleep time" due to inactivity signal-desktop will always end up in the broken state. Unfortunately, I didn't figure out how to trigger the misbehavior on the push of a button. Happy to try suggested courses of action though.

megele commented 1 year ago

Another strange observation I made, when the "synchronization issue" happens, terminating signal-desktop via File->Quit does not terminate any of the many processes making up the client. I have to kill them explicitly via kill to terminate them and then be able to restart signal-desktop and sync appropriately.