status-im / status-go

The Status module that consumes go-ethereum
https://status.im
Mozilla Public License 2.0
728 stars 247 forks source link

Relay disconnection/connection speed #5303

Open fryorcraken opened 5 months ago

fryorcraken commented 5 months ago

Problem

Doing some basic tests to understand Waku Relay disconnect/connection behaviour.

  1. Start the app
  2. Check number of peers in node management: 8
  3. Drop a message in private group
    DBG 2024-06-06 16:05:41.148+10:00 NewBE_callPrivateRPC                       topics="rpc" tid=80348 file=core.nim:27 rpc_method=wakuext_sendChatMessage
  4. Kill internet, start stopwatch
  5. Wait for # of peers to go to zero-> takes 35s

Around this time:

INF 2024-06-06 16:06:42.985+10:00 history request failed                     topics="mailservers-service" tid=80348 file=service.nim:176 requestId=83ec1ef6189e35a3b6704d793fa6d087067c3d552eb19e8f345caeee2abb146f peerId=16Uiu2HAm9CQhsuwPR54q27kNj9iaQVfyRzTGKrhFmr94oD8ujU6P errorMessage="stream reset"
  1. Resume connectivity

  2. new peer in 7s, 5 peers in 23 s, looks fine.

  3. Send message, looks fine.

  4. Then tried again, 27s to 0 peers (16:09)

  5. spammed retry button (ignore that, I forgot to resume connectiviity

    DBG 2024-06-06 16:10:20.012+10:00 NewBE_callPrivateRPC                       topics="rpc" tid=80348 file=core.nim:27 rpc_method=wakuext_reSendChatMessage
  6. Resume connectiivty

  7. 1min to get 1 peer around 16:13:35.

  8. Send message, looks sent fine.

  9. Disconnected again and sent messages as soon as disconnected, before peers drop to 0:

DBG 2024-06-06 16:18:46.910+10:00 NewBE_callPrivateRPC                       topics="rpc" tid=80348 file=core.nim:27 rpc_method=wakuext_sendChatMessage

Message not marked as failed at first. But then marked as failed when peer number reaches 0.

I also noticed that a message sent just before the disconnection was marked as "not sent".

  1. Finally, by 16:27, I still have a infinite wheel of death on top of the chat room, is there a pending store query that never gets resolved?

Note, all messages were sent in chat group (ie, with MVDS).

Learnings

I did few more tests, it takes 20 to 40 s to go to 0 peers from internet loss.

Interesting that a message I sent before disconnection was not sent. What is the delay between clicking enter and message over the wire?

Note that no banner ever opens when we know relay is disconnected (0 peers), but we have banners for external services.

Questions

  1. What is the delay between clicking "enter" and message sent over the wire? seems to be more than 1 sec (I haven't tested thoroughly)?
  2. If we know that we are connected to 0 peers (from node management tab), why aren't we showing a banner?
  3. From a user PoV, what kind of timeframe should we be aiming for between the moment they are disconnected to the momen they get the feedback on desktop? sub second may not be realistic. is under 10s good enough? under 5s?
  4. wheel of death feedback: What is going on? is it a store query sent when we did not know we were disconnected? Would be good to fix/investigate.
  5. 1 minute to get first peer seems quite slow, can this be improved?

app_20240606_160439.log geth.log

fryorcraken commented 5 months ago

wheel of death I am referring to:

image

fryorcraken commented 5 months ago
  1. If we know that we are connected to 0 peers (from node management tab), why aren't we showing a banner?

Actually I see a banner now. Not sure if it's me or random. Will test more.

fryorcraken commented 5 months ago

Redid a test around 17:12 and the reconnection was almost instantaneous.

logs:

app_20240606_170515.log geth.log

chaitanyaprem commented 4 months ago

@fryorcraken Do you still see this issue? If not, we can close this.

fryorcraken commented 1 month ago

I reported quite a few things in there. Will check again.