microg / GmsCore

Free implementation of Play Services
https://microg.org
Apache License 2.0
8.29k stars 1.7k forks source link

Cloud messaging gets disconnected - never connects back #2411

Open bugsyb opened 3 months ago

bugsyb commented 3 months ago

Affected app Name: all using Google Cloud Messaging (GCM/CM)

Describe the bug Cloud messaging gets disconnected and never connects back automatically. The issue various WiFi nets, at home, hotels, etc. regardless if it goes out to internet directly or via a router piping traffic via a VPN provider. There's no consistency - it is something on the phone side.

It does not happen on data (tested in roaming) (default Automatic settings used). It is not consistent, it never works well on wifi (same on which it might be failing consistently. There are situations that it works fine for quite some time on the same WiFi at home going out via VPN or directly (no app installed on the phone at all, just routing is set up that way). Checked connections table on the router - and connections seem to be present there. Whilst at first glance it might look like "oh, this is due to VPN", experience shows that it is not the case. Just mentioning it if it could provide additional light on situation. Just right now, after fresh reboot, with battery below 15%, it just works fine for quite some time on this wifi going out via VPN. Earlier it was failing throughout the couple of days on both wifi, directly going out to internet same as via VPN.

The only way around is to set the ping to 60 seconds. Any other setting, including ping every 2 mins, gets it disconnected after some time (likely after that 60+ seconds. To re-connect, toggle needs to be pushed to disable CM and re-enable it again.

Checked connections at router level and seems like these are kept alive.

Feeling is that it has something to do with Samsung new approach to kill apps, even if all Self-check tick-boxes within MicroG are set/enabled. Validated also in the phone/app settings for MicroG, that it is "Unrestricted".

Trying to set MicroG to "never put to sleep", identified that it is not listed on the Apps list, hence there's no way to add it to the list - this could be quick and easy workaround to the problem.

Reading at https://dontkillmyapp.com/samsung, seems like they kill apps, even if apps are set to be unrestricted.

From other experience, I've noticed that when phone is on the go, reconnecting between BTSes, apps are "reconnected"/kept alive and connection is more often kept alive for longer, that could be the reason why it stayed alive when on mobile data (even in roaming).

To Reproduce Steps to reproduce the behavior:

  1. Just let the phone running and later go to MicroG -> Settings -> Cloud Messaging - check status.

Expected behavior Cloud messaging to re-connect when loses connectivity automatically.

System Android Version: 13 Samsung ROM Kernel 5.15.78 One UI 5.1.1

microG microG Core version: 0.3.2.240913 microG Self-Check results: All ticked

Similar issues (if not the same root cause): https://github.com/microg/GmsCore/issues/1011 https://github.com/microg/GmsCore/issues/998 https://github.com/microg/GmsCore/issues/1810 https://github.com/microg/GmsCore/issues/1755

bugsyb commented 3 months ago

Did some further troubleshooting and seeing this:

06-22 18:46:39.763 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:46:40.768 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:46:41.768 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:46:42.772 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:46:43.772 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:46:44.776 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:46:45.781 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:46:46.782 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:46:47.779 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:46:48.786 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:46:49.791 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:46:50.788 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:46:51.434 20960 20960 D GmsGcmMcsSvc: onDestroy
06-22 18:46:51.434 20791 20791 D GmsGcmMcsSvc: Closing all sockets...
06-22 18:46:51.434 20960 20960 D GmsGcmMcsSvc: Closing all sockets...
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc: Exception while connecting to mtalk.google.com:5228
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc: java.net.SocketException: Socket closed
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at java.net.SocketInputStream.socketRead0(Native Method)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at java.net.SocketInputStream.socketRead(SocketInputStream.java:118)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at java.net.SocketInputStream.read(SocketInputStream.java:173)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at java.net.SocketInputStream.read(SocketInputStream.java:143)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at com.android.org.conscrypt.ConscryptEngineSocket$SSLInputStream.readFromSocket(ConscryptEngineSocket.java:945)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at com.android.org.conscrypt.ConscryptEngineSocket$SSLInputStream.processDataFromSocket(ConscryptEngineSocket.java:909)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at com.android.org.conscrypt.ConscryptEngineSocket$SSLInputStream.-$$Nest$mprocessDataFromSocket(Unknown Source:0)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at com.android.org.conscrypt.ConscryptEngineSocket.doHandshake(ConscryptEngineSocket.java:241)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at com.android.org.conscrypt.ConscryptEngineSocket.startHandshake(ConscryptEngineSocket.java:220)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at com.android.org.conscrypt.ConscryptEngineSocket.waitForHandshake(ConscryptEngineSocket.java:572)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at com.android.org.conscrypt.ConscryptEngineSocket.getInputStream(ConscryptEngineSocket.java:296)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at org.microg.gms.gcm.McsService.connect(McsService.java:442)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at org.microg.gms.gcm.McsService.connect(McsService.java:473)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at org.microg.gms.gcm.McsService.handleMessage(McsService.java:694)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at android.os.Handler.dispatchMessage(Handler.java:102)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at android.os.Looper.loopOnce(Looper.java:226)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at android.os.Looper.loop(Looper.java:313)
06-22 18:46:51.442 20960 21311 W GmsGcmMcsSvc:  at org.microg.gms.gcm.McsService$HandlerThread.run(McsService.java:163)
06-22 18:46:51.442 20960 21311 D GmsGcmMcsSvc: Closing all sockets...
06-22 18:46:51.445 20960 21311 D GmsGcmMcsSvc: Starting MCS connection to port 443...
06-22 18:46:51.549 20960 21311 D GmsGcmMcsSvc: Connected to mtalk.google.com:443
06-22 18:46:51.556 20960 21311 D GmsGcmMcsSvc: Activated SSL with mtalk.google.com:443
06-22 18:46:51.716 20960  8775 D GmsGcmMcsSvc: Sending login request...
06-22 18:46:51.720 20960 21311 D GmsGcmMcsSvc: Scheduling heartbeat in 900 seconds...
06-22 18:46:51.726 20960  8775 D GmsGcmMcsOutput: Outgoing message: LoginRequest{id=android-33, domain=mcs.android.com, user=XXX, resource=XXX, auth_token=XXX, device_id=android-XXX, setting=[Setting{name=new_vc, value_=1}], adaptive_heartbeat=false, use_rmq2=true, auth_service=ANDROID_ID, network_type=1}
06-22 18:46:51.726 20960  8775 D GmsGcmMcsOutput: Write MCS version code: 41
06-22 18:46:51.869 20960  8774 D GmsGcmMcsInput: Reading from MCS version: 41
06-22 18:46:51.874 20960  8774 D GmsGcmMcsInput: Incoming message: LoginResponse{id=android-33, jid=user@firebase.com/notifications, last_stream_id_received=1, server_timestamp=1719074812219}
06-22 18:46:51.883 20960  8774 D GmsGcmMcsSvc: Logged in
06-22 18:46:51.884 20960  8774 D GmsGcmMcsInput: Incoming message: IqStanza{type=SET, id=, extension=Extension{id=12, data_=[size=0]}}
06-22 18:46:51.884 20960  8774 W GmsGcmMcsSvc: Unknown message: IqStanza{type=SET, id=, extension=Extension{id=12, data_=[size=0]}}
06-22 18:46:53.482 20960 20960 D GmsGcmTrigger: Not connected to GCM but should be, asking the service to start up. Triggered by: Intent { act=org.microg.gms.gcm.FORCE_TRY_RECONNECT flg=0x10 cmp=com.google.android.gms/org.microg.gms.gcm.TriggerReceiver }
06-22 18:46:53.501 20960 21311 D GmsGcmMcsSvc: Connect initiated, reason: Intent { act=org.microg.gms.gcm.FORCE_TRY_RECONNECT flg=0x10 cmp=com.google.android.gms/org.microg.gms.gcm.TriggerReceiver }
06-22 18:47:51.963 20960  8774 W GmsGcmMcsInput: mcsTag: -1 mcsSize: -1
06-22 18:47:51.984 20960  8774 D GmsGcmMcsSvc: Teardown initiated, reason: null message
06-22 18:47:51.985 20960  8774 D GmsGcmMcsSvc: Closing all sockets...
06-22 18:47:51.991 20960  8774 D GmsGcmMcsSvc: Scheduling reconnect in 5 seconds...
06-22 18:47:57.023 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:47:57.023 20960 20960 D GmsGcmTrigger: Not connected to GCM but should be, asking the service to start up. Triggered by: Intent { act=org.microg.gms.gcm.mcs.RECONNECT flg=0x14 cmp=com.google.android.gms/org.microg.gms.gcm.TriggerReceiver (has extras) }
06-22 18:47:57.032 20960 21311 D GmsGcmMcsSvc: Connect initiated, reason: Intent { act=org.microg.gms.gcm.mcs.RECONNECT flg=0x14 cmp=com.google.android.gms/org.microg.gms.gcm.TriggerReceiver (has extras) }
06-22 18:47:57.032 20960 21311 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:47:57.032 20960 21311 D GmsGcmMcsSvc: Closing all sockets...
06-22 18:47:57.045 20960 21311 D GmsGcmMcsSvc: Starting MCS connection to port 5228...
06-22 18:47:57.187 20960 21311 D GmsGcmMcsSvc: Connected to mtalk.google.com:5228
06-22 18:47:57.193 20960 21311 D GmsGcmMcsSvc: Activated SSL with mtalk.google.com:5228
06-22 18:48:31.467 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:48:32.451 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:48:33.467 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 18:48:34.461 20960 20960 D GmsGcmMcsSvc: Connection is not enabled or dead.

Additionally at packet level it all looks like as if it would be the mtalk.google.com closing connection:

Number of tries later and apparently, the status display is nonconsecutive, meaning, every second "enable" it shows status disconnected/connected and timer and at network packet level it doesn't match.

i.e. whilst it was showing "Disconnected" and packets were seen when hit toggle to disable CM following popped up in logcat:

06-22 19:36:03.695 13138 13138 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 19:36:04.678 13138 13138 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 19:36:05.412 13090 13090 D GmsGcmMcsSvc: Closing all sockets...
06-22 19:36:05.412 13138 13138 D GmsGcmMcsSvc: onDestroy
06-22 19:36:05.413 13138 13138 D GmsGcmMcsSvc: Closing all sockets...
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc: Exception while connecting to mtalk.google.com:5228
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc: java.net.SocketException: Socket closed
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at java.net.SocketInputStream.socketRead0(Native Method)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at java.net.SocketInputStream.socketRead(SocketInputStream.java:118)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at java.net.SocketInputStream.read(SocketInputStream.java:173)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at java.net.SocketInputStream.read(SocketInputStream.java:143)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at com.android.org.conscrypt.ConscryptEngineSocket$SSLInputStream.readFromSocket(ConscryptEngineSocket.java:945)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at com.android.org.conscrypt.ConscryptEngineSocket$SSLInputStream.processDataFromSocket(ConscryptEngineSocket.java:909)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at com.android.org.conscrypt.ConscryptEngineSocket$SSLInputStream.-$$Nest$mprocessDataFromSocket(Unknown Source:0)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at com.android.org.conscrypt.ConscryptEngineSocket.doHandshake(ConscryptEngineSocket.java:241)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at com.android.org.conscrypt.ConscryptEngineSocket.startHandshake(ConscryptEngineSocket.java:220)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at com.android.org.conscrypt.ConscryptEngineSocket.waitForHandshake(ConscryptEngineSocket.java:572)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at com.android.org.conscrypt.ConscryptEngineSocket.getInputStream(ConscryptEngineSocket.java:296)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at org.microg.gms.gcm.McsService.connect(McsService.java:442)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at org.microg.gms.gcm.McsService.connect(McsService.java:473)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at org.microg.gms.gcm.McsService.handleMessage(McsService.java:694)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at android.os.Handler.dispatchMessage(Handler.java:102)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at android.os.Looper.loopOnce(Looper.java:226)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at android.os.Looper.loop(Looper.java:313)
06-22 19:36:05.419 13138 13606 W GmsGcmMcsSvc:  at org.microg.gms.gcm.McsService$HandlerThread.run(McsService.java:163)
06-22 19:36:05.419 13138 13606 D GmsGcmMcsSvc: Closing all sockets...
06-22 19:36:05.422 13138 13606 D GmsGcmMcsSvc: Starting MCS connection to port 443...
06-22 19:36:05.500 13138 13606 D GmsGcmMcsSvc: Connected to mtalk.google.com:443
06-22 19:36:05.502 13138 13606 D GmsGcmMcsSvc: Activated SSL with mtalk.google.com:443
06-22 19:36:05.680 13138 13138 D GmsGcmMcsSvc: Connection is not enabled or dead.
06-22 19:36:05.697 13138 14314 D GmsGcmMcsSvc: Sending login request...
06-22 19:36:05.698 13138 13606 D GmsGcmMcsSvc: Scheduling heartbeat in 900 seconds...
06-22 19:36:05.703 13138 13606 D GmsGcmMcsSvc: Connect initiated, reason: Intent { act=android.net.conn.CONNECTIVITY_CHANGE flg=0x4200010 (has extras) }
06-22 19:36:05.741 13138 14314 D GmsGcmMcsOutput: Outgoing message: LoginRequest{id=android-33, domain=mcs.android.com, user=XXX, resource=XXX, auth_token=XXX, device_id=android-XXX, setting=[Setting{name=new_vc, value_=1}], adaptive_heartbeat=false, use_rmq2=true, auth_service=ANDROID_ID, network_type=1}
06-22 19:36:05.741 13138 14314 D GmsGcmMcsOutput: Write MCS version code: 41
06-22 19:36:05.858 13138 14313 D GmsGcmMcsInput: Reading from MCS version: 41
06-22 19:36:05.955 13138 14313 D GmsGcmMcsInput: Incoming message: LoginResponse{id=android-33, jid=user@firebase.com/notifications, last_stream_id_received=1, server_timestamp=1719077766304}
06-22 19:36:05.968 13138 14313 D GmsGcmMcsSvc: Logged in
06-22 19:36:05.971 13138 14313 D GmsGcmMcsInput: Incoming message: IqStanza{type=SET, id=, extension=Extension{id=12, data_=[size=0]}}
06-22 19:36:05.971 13138 14313 W GmsGcmMcsSvc: Unknown message: IqStanza{type=SET, id=, extension=Extension{id=12, data_=[size=0]}}

Same network and same microg version as well as same Android version, but this time not Samsung, but crDroid and it seems to work fine from functionality perspecti. There's something odd going on with the status and otherwise it works.

This leads to conclusion that: a) there's something odd with status for anything above 60sec of ping anyway, b) on ROMs like Samsung One UI, there are troubles due to the way roms treat apps and give microG doesn't show on the list of Apps, there's no way to whitelist it to keep it always running.

Could these two please be fixed?

Sapiosenses commented 2 months ago

In addition to the aggressive power management thing, how many WiFi networks have you tested this on?

Because your hardware there may just have an extremely short timeout on open sockets, which is why the 60 second ping time solves it.

Could also be some timeout imposed by an on-device firewall or VPN client etc.

Back in the olden days of networking that was acceptable because very little traffic relied on "long time open" sockets.

But it's really common these days for things like IMAP-PUSH and other quasi-push notification protocols, various forms of NAT traversal, etc etc.

dimka40185 commented 2 weeks ago

Huawei has the same problem

Vavun commented 2 weeks ago

Not only huawei. Sometimes it happens on my xiaomi with crdroid

bugsyb commented 2 weeks ago

In addition to the aggressive power management thing, how many WiFi networks have you tested this on?

Multiple, probably in range of 20-30. With that said, I've noticed that the problem sort of goes and comes back, even on home network.

Running additional tests, noticed that if running traffic from router level via NordVPN (no client/whatsoever on Android, all taken care at router level), then problem shows almost instantly. Left it routed via that NordVPN and surprisingly, sometimes notifications do come, i.e. Duo (MFA) and some other time, I have to run the app to get notification.

At the same time, other sessions, without keep-alives do run smoothly via the very same NordVPN. We probably would need to get some debug enabled build to look more into details why/where it fails. I'd be even happy to set up a server which could mimic google/whatever - most likely it just needs to be a tcp server to keep the session open and see at packet level from which side is the session terminated. With that said, I might even go ahead and just test the session termination/keepalive or not with standard Linux tools. Would that be a sufficient test to see session termination for the cloud messaging?