Open Horupa-Olena opened 1 month ago
I've looking into this issue and I thought I should share some notes on what I've been seeing.
Fail to subscribe
failed to resubscribe for filter
subscribing again
handleFilterSubscribeOptions
handleFilterSubscribeOptions exit
peer selection
dialing peer
Failed to subscribe
updating after some analysis:
failed to subscribe
and trying to resubscribe is expected because desktop peers also support filter and mobile may be trying to subscribe to a desktop peer which maybe offline and hence you can see dial errors. The idea is in case of failure, it should recover as subscriptions are retried. 404
for filter ping. need to analyze why that could be happening. this could lead to delay in messages being received. Wondering if app going into background is causing peers to be disconnected and pings to not happen? any idea of the behaviour cc @seanstrom i see drop in peer-counts at
10-10|16:22:56.820
,10-10|16:24:51.155
,10-10|16:28:30.978
,10-10|16:42:45.776
10-11|08:34:21.105
times. it would help if you can give some timeline of events so that i can understand logs better.
ERROR[10-10|15:40:18.671|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:425] Failed to subscribe pubSubTopic=/waku/2/rs/16/32 contentTopics=[] error="at least one contenttopic should be specified"
~ this is not a concern as i see this resolved automatically.it would help to know which messages were missed, if you have any messageID, contentTopic , time and date to look in more details what exactly is happening.
Hey @chaitanyaprem thanks for analysis, great stuff 🙌
I've attempted to reproduce this glitch again, but this time I reproduced the .glitch using a combination of an iOS simulator for sending messages to another iOS device. This was done because it was easier for me to access the logs of the simulator and I was having some issues running the dev build of the desktop app. If it's important to reproduce this glitch lmk and I can try again.
That being said, it did take me a while to finally capture the same glitch, but I was able to get a copy of the logs from the sending device and the receiver device so we can compare them.
More details here: https://github.com/status-im/status-mobile/issues/21172#issuecomment-2420816752
To respond to your question about the disconnecting peers when backgrounding the app:
i have noticed a sudden drop in filter subscriptions because the peers are returning 404 for filter ping. need to analyze why that could be happening. this could lead to delay in messages being received. Wondering if app going into background is causing peers to be disconnected and pings to not happen? any idea of the behaviour
Supposedly when we background the application there could be some behaviour where the operating system will "idle" the application if it's waiting in the background. This could have an affect on network activity or network resources, but I cannot say for certain. Perhaps @ilmotta or @igor-sirotin may have some more context about this behaviour.
Though what I've been able to notice is that this issue seems to mainly occur when the app returns to the foreground (the app was reopened), and then the network reconnects. So potentially on a low or missing data connection (mobile lte) the app may go through a connection state change. When that happens perhaps there's some logic that handling resources based on whether the connection is offline or online, but I'm not sure so I'd need to do some more digging through the code to find logic related to connection state.
I do know that there is logic for checking the connection state when that is going from the background to the foreground, but I'm not sure if there's any logic for re-examining the connection state when the app is already running. This seems to be an important detail since I'm mainly able to reproduce this issue when reconnecting to the network after opening the app.
To respond to your question about missing messages:
Even if filter fails, there should be periodic store queries happening every 1 minute to check for missing messages and query from store node. is this disabled when user is on 4G? cc @seanstrom Because right now we depend on store for reliability and if this is disabled, might be causing issues.
I do notice from the logs that we are attempting to retrieve missing messages, and it seems we've configured status-go to retrieve these missing message, but I haven't found any conditions for not doing retrieving the messages based on the connection state atm. I'll need to look deeper into the codebase to verify what's happening, but I think we're setup to to retrieve missing messages.
Here's some updated logs and context from reproducing this glitch:
From the logs of the iOS Simulator (the device sending a message)
INFO [10-17|21:10:54.912|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/peer_exchange/client.go:156] connecting to newly discovered peers count=20
INFO [10-17|21:10:56.232|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/publish/message_sender.go:115] publishing message via lightpush envelopeHash=0xd91e8399546bfecf15c1ff688d5a0856fad7c6bef920eb4a4fda4af33c6d8595 pubsubTopic=/waku/2/rs/16/32 contentTopic=/waku/1/0xe88187f2/rfc26 timestamp=1,729,199,456,232,070,000
INFO [10-17|21:10:56.525|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:214] received message push peerID=16Uiu2HAm8mUZ18tBWPXDQsaF7PbCKYA35z7WB2xNZH2EVq1qS8LJ hash=0xd91e8399546bfecf15c1ff688d5a0856fad7c6bef920eb4a4fda4af33c6d8595 pubsubTopic=/waku/2/rs/16/32 contentTopic=/waku/1/0xe88187f2/rfc26 timestamp=1,729,199,456,232,070,000
INFO [10-17|21:10:56.812|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:214] received message push peerID=16Uiu2HAmAR24Mbb6VuzoyUiGx42UenDkshENVDj4qnmmbabLvo31 hash=0xd91e8399546bfecf15c1ff688d5a0856fad7c6bef920eb4a4fda4af33c6d8595 pubsubTopic=/waku/2/rs/16/32 contentTopic=/waku/1/0xe88187f2/rfc26 timestamp=1,729,199,456,232,070,000
INFO [10-17|21:10:56.932|github.com/status-im/status-go/protocol/messenger.go:3585] processing message site=RetrieveAll hash=0xd91e8399546bfecf15c1ff688d5a0856fad7c6bef920eb4a4fda4af33c6d8595 message-id=0x7e48aaa26c5b8c3247479417b133ad55b321821de66e81e937642386a0ebc16f type=CHAT_MESSAGE senderID=0x046bf754f6eecd4a6d73557b4795eef4e824f13c63d4343356f6ec1d6597c9ba02f09dc75809f208283f370d60f0a76555dc132fdc17970460a1f93c87d7d01490
WARN [10-17|21:10:56.932|github.com/status-im/status-go/protocol/messenger_maps.go:65] contacts map: loading own identity contactID=0x046bf754f6eecd4a6d73557b4795eef4e824f13c63d4343356f6ec1d6597c9ba02f09dc75809f208283f370d60f0a76555dc132fdc17970460a1f93c87d7d01490
contentTopic=/waku/1/0xe88187f2/rfc26
message-id=0x7e48aaa26c5b8c3247479417b133ad55b321821de66e81e937642386a0ebc16f
timestamp=1,729,199,456,232,070,000
10-17|21:10:56.232
From the logs of the iOS Device (the device receiving a message)
WARN [10-17|21:10:56.699|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:207] received messagepush with invalid subscription parameters peerID=16Uiu2HAmNTpGnyZ8W1BK2sXEmgSCNWiyDKgRU3NBR2DXST2HzxRU hash=0xd91e8399546bfecf15c1ff688d5a0856fad7c6bef920eb4a4fda4af33c6d8595 pubsubTopic=/waku/2/rs/16/32 contentTopic=/waku/1/0xe88187f2/rfc26 timestamp=1,729,199,456,232,070,000
DEBUG[10-17|21:10:57.568|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:185] retrieving history to check for missing messages peerID=16Uiu2HAm9CQhsuwPR54q27kNj9iaQVfyRzTGKrhFmr94oD8ujU6P contentTopics="[/waku/1/0x7e6732ec/rfc26 /waku/1/0x5759bbff/rfc26 /waku/1/0xacfae801/rfc26 /waku/1/0x696bdb86/rfc26 /waku/1/0xa35b9bb3/rfc26 /waku/1/0xb69e0a57/rfc26 /waku/1/0x0b65ce26/rfc26 /waku/1/0xe5eb41b8/rfc26 /waku/1/0xa9a5e604/rfc26]" pubsubTopic=/waku/2/rs/16/32 from=1729192525821337000 to=1729192778950497000 attempt=1
ERROR[10-17|21:10:57.600|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:428] Failed to subscribe pubSubTopic=/waku/2/rs/16/32 contentTopics="[/waku/1/0x6c561c0d/rfc26 /waku/1/0x8396b7a1/rfc26 /waku/1/0xe88187f2/rfc26]" error="failed to dial: failed to dial 16Uiu2HAmBPzatVXrBnv6bJVdUNDXTUJ5y9JRGAbtcHdhzhjm6Ssk: all dials failed\n * [/ip4/152.58.243.206/tcp/37331] dial tcp4 0.0.0.0:49443->152.58.243.206:37331: i/o timeout"
received messagepush with invalid subscription parameters
Logs from the device: ios-device-logs.zip Logs from the simulator: ios-simulator-logs.log.zip
Unfortunately I was only able to extract the geth.log
from the iOS simulator, but hopefully that's enough for now.
Some follow-up notes:
received messagepush with invalid subscription parameters
I do notice that the missing message verification does happen after the reconnection happens and the message is not delivered, but the contentTopic /waku/1/0xe88187f2/rfc26
seems to be missing from the list of topics that would be retrieved.
DEBUG[10-17|21:11:51.613|[github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:185](http://github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:185)] retrieving history to check for missing messages peerID=16Uiu2HAm9CQhsuwPR54q27kNj9iaQVfyRzTGKrhFmr94oD8ujU6P contentTopics="[/waku/1/0x7e6732ec/rfc26 /waku/1/0x5759bbff/rfc26 /waku/1/0xacfae801/rfc26 /waku/1/0x696bdb86/rfc26 /waku/1/0xa35b9bb3/rfc26 /waku/1/0xb69e0a57/rfc26 /waku/1/0x0b65ce26/rfc26 /waku/1/0xe5eb41b8/rfc26 /waku/1/0xa9a5e604/rfc26]" pubsubTopic=/waku/2/rs/16/32 from=1729192525821337000 to=1729192778950497000 attempt=1
DEBUG[10-17|21:11:51.614|[github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/store/client.go:268](http://github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/store/client.go:268)] sending store request peer=16Uiu2HAm9CQhsuwPR54q27kNj9iaQVfyRzTGKrhFmr94oD8ujU6P requestId=37316536333764356134346664326263646135303038373463656566373765376165353464323833313932306163306430383564303462646634316563633136
ERROR[10-17|21:11:51.614|[github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:190](http://github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:190)] could not query storenode peerID=16Uiu2HAm9CQhsuwPR54q27kNj9iaQVfyRzTGKrhFmr94oD8ujU6P contentTopics="[/waku/1/0x7e6732ec/rfc26 /waku/1/0x5759bbff/rfc26 /waku/1/0xacfae801/rfc26 /waku/1/0x696bdb86/rfc26 /waku/1/0xa35b9bb3/rfc26 /waku/1/0xb69e0a57/rfc26 /waku/1/0x0b65ce26/rfc26 /waku/1/0xe5eb41b8/rfc26 /waku/1/0xa9a5e604/rfc26]" pubsubTopic=/waku/2/rs/16/32 from=1729192525821337000 to=1729192778950497000 error="context canceled" attempt=1
For comparison, we can see that the topic is included when the messages are retrieved. Note that these logs are from a different attempt to reproduce the glitch.
DEBUG[10-17|23:55:24.468|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:185] retrieving missing messages peerID=16Uiu2HAmMELCo218hncCtTvC2Dwbej3rbyHQcR8erXNnKGei7WPZ contentTopics="[/waku/1/0xa9a5e604/rfc26 /waku/1/0x5759bbff/rfc26 /waku/1/0x696bdb86/rfc26 /waku/1/0xacfae801/rfc26 /waku/1/0x7e6732ec/rfc26 /waku/1/0xe5eb41b8/rfc26 /waku/1/0x8396b7a1/rfc26 /waku/1/0x6c561c0d/rfc26 /waku/1/0xf473d4e0/rfc26 /waku/1/0xe88187f2/rfc26]" pubsubTopic=/waku/2/rs/16/32 from=1729209264462294000 to=1729209324215529000 attempt=1
DEBUG[10-17|23:55:24.468|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/store/client.go:268] sending store request peer=16Uiu2HAmMELCo218hncCtTvC2Dwbej3rbyHQcR8erXNnKGei7WPZ requestId=30393030326432323130633531306439333939663733326233373833656462323631646336396438376361333064383537323135343331666265383430663834
@seanstrom FYI, could be related: https://github.com/status-im/status-go/issues/5659
Thanks a lot for detailed tests and analysis, it helped me look at specific areas in the logs.
I’m not sure how subscriptions are coordinated between the two devices, but perhaps when the physical device goes offline and reconnects, maybe the subscriptions are cleared in some way which could cause the message to be seen as an undesired message (?)
this could be a reason why invalid subscription params
is noticed. while going through logs i had noticed that app lost network connectivity at 21:10:34
and immediately it got connectivity back at 21:10:50
.
when connectivity is lost client removes all subscriptions and once it is back it will try to subscribe again. but server is not aware client has removed subscription and is sending for a filter it registered before network is lost. this may not happen if app is in background for a longer period of time (i.e time between disconnection and reconnection is more than 5/10 minutes). Note that this is fine and expected to happen.
INFO [10-17|21:10:34.328|github.com/status-im/status-go/api/geth_backend.go:2439] Network state change package=status-go/api.GethStatusBackend old=wifi new=offline .... INFO [10-17|21:10:50.168|github.com/status-im/status-go/api/geth_backend.go:2439] Network state change package=status-go/api.GethStatusBackend old=offline new=wifi
- Here I noticed that we do receive a message, but it has a warning of:
received messagepush with invalid subscription parameters
while going through logs, noticed that this could happen because filter subscriptions are not successful due to peers that are not reachable and taking time to stabilize. will analyze further to see if there is some other issue.
but in the meantime i have made a simple logic to remove bad peers if dial fails twice to them and included it https://github.com/status-im/status-mobile/pull/21458. can you try to simulate the issue with this version.
- Missing messages do seem to be attempted to be retrieved but it doesn’t seem to be happening for the identified contentTopic, but in this case there weren’t any missing messages to be retrieved because we had received all the existing messages and were only missing the latest message
this is odd, wondering what might be the reason for this content-topic not being chosen for missing messages. maybe @richard-ramos has an idea.
@chaitanyaprem is the filter for this content topic is ephemeral? those are ignored for retrieving missing messages: https://github.com/status-im/status-go/blob/e611b1e5131fff706151661d15cc2904c20a7f71/protocol/transport/transport.go#L699C19-L699C24
@chaitanyaprem is the filter for this content topic is ephemeral? those are ignored for retrieving missing messages: https://github.com/status-im/status-go/blob/e611b1e5131fff706151661d15cc2904c20a7f71/protocol/transport/transport.go#L699C19-L699C24
ah, that might make sense...i have no idea though. maybe @seanstrom can help
Follow up: https://github.com/status-im/status-mobile/pull/20730
Problem
Users experience issues with delayed or missing messages in community on both iOS and Android devices when reconnecting from offline to online (4G) .
On iOS: There is a noticeable delay in receiving messages when switching from offline to online (4G). On Android: Messages either do not arrive at all (waited up to 10 minutes) or only a portion of the messages is received immediately. In 6 tests conducted: 4 cases of missing messages and 2 message partly deliveries.
Reproduction
Expected behavior
Messages should be received promptly after the user goes back online, even if the app was in the background.
Actual behavior
iOS: Messages are delayed upon reconnecting to the 4G network after being offline. Android: Either no messages are received or only a subset of messages arrives instantly. The issue was observed in 4 out of 6 tests.
Additional Information
Comments:
Examples of tests on Android (4G enabled):
13:53 - device is offline 13:55 - device returns online 13:55 - 10 messages are sent from the desktop app 14:05 - no messages have arrived in the mobile app
14:06 - device is offline 14:07 - device returns online 14:09 - 10 messages are sent from the desktop app 14:09 - 5 out of 10 messages are received 14:21 - the remaining 5 messages still haven't arrived
Android:
Mobile log: logcat.zip
Desktop log: geth.zip
IOS:
https://github.com/user-attachments/assets/7cb0b6dc-ba79-4908-8f13-29ec09906c0b
Mobile log: logs.zip
Desktop log: geth.log