signalapp / Signal-Android

A private messenger for Android.
https://signal.org
GNU Affero General Public License v3.0
25.49k stars 6.11k forks source link

Network connection seems to get stuck (without Google services) #7638

Closed Socob closed 4 years ago

Socob commented 6 years ago

Re-filing #7420 due to #7598.



Bug description

This is a revival of #7420 and may be the same issue as #6447, #6644 and #6880. I am using the official Signal APK from https://signal.org/android/apk/, so not Noise or anything like that.

For months now, Signal on Android’s network connection seems to “freeze” sporadically (once or twice per day), leaving it unable to send or receive any messages (even things like read confirmations) for long periods of time. During this time, messages can be sent and received from the Signal desktop client normally and without any delay, but none of these messages show up in the Android client. It does not make a difference whether the Android client is open or any conversations are opened – nothing happens.

It seems that Signal can be made “un-stuck” by resetting the phone’s network connection (e. g. disabling WiFi so that the phone switches to cellular data) or using “force stop” to kill and restart the Signal Android client completely. Upon one of these events, all of the messages from the intervening time period will appear in the Android client all at once.

Steps to reproduce

Unfortunately, since this seems to happen erratically, concrete steps to reproduce are a bit difficult to determine.

Actual result: No new messages (sent or received) are displayed in the Signal Android app, even if it is open. This occurs when the phone and the desktop are on the same WiFi network, so the issue is not due to poor connectivity. Expected result: Messages should appear immediately in the Signal Android client.

Device info

Device: OnePlus One Android version: 7.1.2 (LineageOS 14.1). I have never had Google Play services installed and Android Doze is disabled (“Battery optimization: Not optimized”) for Signal. Signal version: 4.17.5 (but I’ve been having this issue for months now, so includes many earlier versions)

Link to debug log

https://gist.github.com/anonymous/b1f29ef755a08091528fefff299057d9

At 15:25, my phone switched to cellular data (because I went out of WiFi range). At that point, all the messages from the last hour or so appeared all at once.

This debug log was taken with Signal version 4.15.5, but the issue still occurs with the latest version (4.17.5).

dpapavas commented 5 years ago

I was assuming netd was related because of log lines such as the following, on a device that was not affected:

09-19 11:23:19.644 328 1265 I Netd : Destroyed 10 sockets on 192.168.removed in 4.5 ms

I didn't find what the difference between the two devices was, that is, whether it was a matter of netd version or its configuration, but I didn't look too much into it. It would have been the correct approach technically, but it wouldn't be applicable for most users.

So I'm not sure whether this specific problem could be fixed that way, but I don't think it would be worth the effort in any case. If someone can troubleshoot and fix a system component, such as netd, they should also be able to build Signal and install a patched version. The latter would likely be a more viable solution, for reasons I've mentioned already. Signal seems already to be broken again for non-GCM devices in general, not just after switching networks. Even if the patch for it ever gets applied, it will likely have been broken by then in other ways.

Using a patched version is therefore the only way I see of having a somewhat functioning Signal without GCM, at least for as long as someone is forced or willing to investigate the problems and write the patches.

doak commented 5 years ago

@dpapavas, thanks for the reply.

I didn't find what the difference between the two devices was, that is, whether it was a matter of netd version or its configuration, but I didn't look too much into it.

Did you mention which devices you have used? I can't find it neither in this issue nor in the detailed descrption of libsignal-service-java!62.

Signal seems already to be broken again for non-GCM devices in general [...]

A friend of mine is using the current version without issues except the network switching problem. What issues do you mean?

mejo- commented 5 years ago

Did you mention which devices you have used?

I know for sure that the whole network switching problem doesn't occur on Xaiomi Mi A1 phones with LineageOS 15 and it for sure occurs on HTC One M8 and M9 with LineageOS 14.1 (all without GApps).

dpapavas commented 5 years ago

Did you mention which devices you have used? I can't find it neither in this issue nor in the detailed descrption of libsignal-service-java!62.

My own Xiaomi Mi 4 had the problem and a Motorola Moto3 didn't have it. Both devices were running LineageOS 14.1, without Google apps.

A friend of mine is using the current version without issues except the network switching problem. What issues do you mean?

See here and in the next couple of messages.

doak commented 5 years ago

@mejo- , @dpapavas, thanks for your reply.

My own Xiaomi Mi 4 had the problem and a Motorola Moto3 didn't have it. Both devices were running LineageOS 14.1, without Google apps.

Huh, same SW!? Does that mean that something not within the OS influences that? Or do you mean it depends on some HW specific parts of the OS? The comment of @mejo- implied (at least for me) that Lineage v15 had changed something in this regard which fixed it.

I looked into the changes made to netd and tried it on an incomplete development version of Replicant (which is based on LineageOS 14.1) and a plain LineageOS 14.1-20170927, both on Samsung S3 i9305 devices but different ones: Replicant did work (at least for my test case, more on that later), LineageOS didn't. The LineageOS build is quite old, but i3905 is not supported by Lineage anymore (some unsolved issues) and that is the latest version I came across. But since the current LineageOS v14.1 did not work (on a S3 i9300, which is still supported), I assumed that this is not the issue. The version of netd code on Replicant is older (it seems netd code was refactored afterwards).

I have found this interesting commit (cherry-picked for Lineage v15): https://github.com/LineageOS/android_system_netd/search?q=f32fc598b01ba8d59873b0a1085716fd84678b54&type=Commits

I am unsure if the improvement on some devices are caused by a change in netd. (Which was my first assumption.) In the meantime I am not sure anymore that I really talk about the same issue as you: I am able to reproduce doing the following steps:

After that Signal "is out of sync". Device B is able to send (which A receives), but their is now acknowledgement. Messages from device A to B are not delivered. After restarting Signal on device A, past messages and acknowledgement are delivered, thus everything works again. Can you please confirm that this is really your issue??

dpapavas commented 5 years ago

Huh, same SW!? Does that mean that something not within the OS influences that? Or do you mean it depends on some HW specific parts of the OS? The comment of @mejo- implied (at least for me) that Lineage v15 had changed something in this regard which fixed it.

I'm not very familiar with the development process of LineageOS, but I was assuming that either different ports of the same version (i.e. 14.1 for Mi4 and Moto3) could have different system component versions, or perhaps netd can be configured to some extent and the configuration between ports varies.

I can't see how it could be due to hardware differences, but I can't rule it out either.

Can you please confirm that this is really your issue?? It might be; I'm not sure about the "messages from device A to B are not delivered" and the "which A receives" parts.

The problem my patch was trying to address, was that a network change left the old sockets intact and bound to the old IP address. Signal never detected anything out of the ordinary, so it kept listening to a severed connection. As such it couldn't receive messages and notifications from the server. I believe it could send though, as there was a backup path which, after failing to send on the severed main connection, temporarily opened a new one and sent through that.

I'm not really sure though and much of that might depend on the device in various ways. You might try to look through the logs after such a session and look for anything out of the ordinary.

Socob commented 5 years ago

@doak I’ve used Signal on a OnePlus One with both LineageOS 14.1 and 15.1, and I haven’t noticed any difference in behavior with respect to this issue. If there were any relevant changes between LineageOS 14.1 and 15.1, they don’t seem to have affected this issue.

doak commented 5 years ago

Thanks for all your input. I have to have a closer look, it's still weird.

@dpapavas:

The problem my patch was trying to address, was that a network change left the old sockets intact and bound to the old IP address [...]

That's sound like the same issue I am able to reproduce with the mentioned steps.

Sn0whax commented 5 years ago

@dpapavas Are there any updates with the Signal build? Does it still fully work without GCM?

dpapavas commented 5 years ago

@dpapavas Are there any updates with the Signal build? Does it still fully work without GCM?

Not sure which Signal build you refer to. If it's the official build, probably not, because it never fully worked without GCM. Patching was required, which is open as a pull request (#8230) for more than a year now, happily being ignored. Others have made similar efforts (signalapp/libsignal-service-java#70) which seem to share the same fate. See the linked PRs for more information.

Sn0whax commented 5 years ago

This is too bad. Signal could have been so much.

Wed Sep 25 10:36:40 CDT 2019 Dimitris Papavasiliou notifications@github.com:

@dpapavas [https://github.com/dpapavas] Are there any updates with the Signal build? Does it still fully work without GCM?

Not sure which Signal build you refer to. If it's the official build, probably not, because it never fully worked without GCM. Patching was required, which is open as a pull request ( #8230 [https://github.com/signalapp/Signal-Android/pull/8230] ) for more than a year now, happily being ignored. Others have made similar efforts ( signalapp/libsignal-service-java#70 [https://github.com/signalapp/libsignal-service-java/pull/70] ) which seem to share the same fate. See the linked PRs for more information.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub [https://github.com/signalapp/Signal-Android/issues/7638?email_source=notifications&email_token=ALKOUNXLMYKCXEVP2DUK3UTQLOAQJA5CNFSM4EY6LAGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7SLBSA#issuecomment-535081160] , or mute the thread [https://github.com/notifications/unsubscribe-auth/ALKOUNRJE7LD2XRMEZFNLBLQLOAQJANCNFSM4EY6LAGA] . [https://github.com/notifications/beacon/ALKOUNU6YG7U4W5EBGH7GODQLOAQJA5CNFSM4EY6LAGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7SLBSA.gif]

navid-zamani commented 4 years ago

This bug still happens, with 4.50.5 with GApps / Android 6.0 / Huawei EMUI 4.0.3!

Does anyone care that it makes Signal halfway unusable, or is the project dead? :/

greyson-signal commented 4 years ago

I believe this bug was fixed in e3b66dc7e8906b678576c71d01d98aff17c6aefc.

@navid-zamani This thread is specific to devices without GApps. It's unclear what your specific symptoms are, but you may be experiencing different problems related to delayed notifications. Check out #8692.

Also, be sure to first check out support page on fixing delayed notifications: https://support.signal.org/hc/en-us/articles/360007318711-Troubleshooting-Notifications

Huawei is known for being super aggressive with battery optimizations, leading to things like delayed notifications.

is the project dead?

Feel free to check the commit history :) I think the team is doing great work.

moriel5 commented 3 years ago

I hope that my experiences with the Razer Phone 2, running stock Android 9, with Telegram, and my suspicions thereof, may help, even though my issue is with another ecosystem (I do not run Signal, due to my disappointment with it's claim of being open-source, while at the same time the Signal company is aggressively want a war against fully-FOSS forks, that remove reliance upon Google's Play Services (a bad choice for a company that supposedly cares about privacy (no offense to the engineers, who are doing a splendid job)).

I am having the same exact issues as the OP, except on a Razer Phone 2, running stock Android 9 (temporary, this is not really a daily driver until I manage to boot, well anything, other than the stock ROM), with most of the apps disabled (especially Google's Play Services, including all of it's respective parts), and I keep having issues getting Telegram connecting, when it is in the foreground (I do not make use of it in the background), and yes, all battery optimizations are disabled, with unlimited data allowed (testing with WiFi, no SIM inside yet).

Trying unofficial modded versions, as well as alternative clients, appears to help a little, but within the margin of error, so could be placebo.

This only happens when attempting to connect to any "normal" wireless router, on both 2.4Ghz and 5Ghz (both N and AC on 5Ghz), however connecting to a mobile hotspot works without issues. This issue only affects Telegram for me (haven't yet tested IRC, as I am still learning how to get it to work reliably on the desktop, and I do not use WhatsApp, obviously), and only on the Razer Phone 2 (no issues on a Nexus 4 running CrDroid (Android 7.1.2), or the LG G2, G3 or G3 Beat running LineageOS (Android 7.1.2), nor a HiSense Sero Pro running AOSP (Android 6.0.1), nor the desktop Telegram on either Linux or Solus, and checking network activity on the several of the routers (one ISP-provided, and one retail TP-Link router, however these are the Xiaomi Mi 3 and 3G and the LinkSys Wrt32x, all running OpenWrt), shows that the device is the one not making the connection.

From what I can clean, this is not due to battery saving shenanigans in the stock ROM either, so I can only chalk it up to the specific Qualcomm driver and it's configuration in the stock ROM (I may be wrong, this is just a suspicion).

I also have an issue with uploads, which I believe is connected, where upload only upload ~120-240KB every 5 minutes or so when attempting to upload via most apps (including the browser).

To be clear, it's not that Telegram does not work at all, through the browser is just fine, but rather it's connection on this ROM is utterly unstable, and sometimes I am forced to reboot just to get new Telegram messages to come in.