Open richard-ramos opened 1 year ago
Just to add, from past experience, this is something that caused some headaches, since not knowing that a message has been published (especially on mobile networks), can results in message loss and the perception of unreliability. Desktop might be different though, since connection is more stable, but it's worth thinking about it.
Sending a ACK for each message is going to be a bit heavy, as you'll at least have to send the message id, which is 32 bytes (not heavy per se, but that is for each message). Also, you may loose the ACK but not the original message, and the original sender will think it didn't got published, even though it did. And will probably send a duplicate later on the network
Maybe instead:
Or something like that?
Also, you may loose the ACK but not the original message, and the original sender will think it didn't got published, even though it did. And will probably send a duplicate later on the network
Generally, from the app perspective, this is preferred, we'd rather re-transmit, than drop.
Quick question though, if I re-transmit, the id will be the same, would the message be propagated in this case? or there's a cache kept by relayer in order to avoid relaying duplicate messages?
For waku-1, that was a setting between two nodes, so a node could notify the other that it wanted confirmations or not, so wasn't enabled across the network, which leaks metadata, but saves some bandwidth.
We could also lower the 32bytes requirement to send back say 4 bytes only, that should be good enough since clashes are unlikely, but still of course the toll on the network is higher.
Keep a list of the last N sent messages When you get back online, query the store for these messages (need some form of unique ID) If some are missing, re-send
I think this would be too slow from a ux perspective. I am going to consider mobile for now, although I understand that we'll use filter/etc, but just to give a baseline, desktop might be different as more stable connections.
User A sends a message to B. Say the message is important, and connection is flaky.
We want to make sure the user sees the message as not being dispatched, so they know that some action is required on their side (resend, make sure you are connected to internet etc), and they don't just put the phone back in their pocket.
If we pulled next time we went online, we either would have to show the messages as "not dispatched" until online/offline event happens, which would be frustrating, or we query the store node after sending, but it would increase load on store nodes, or we would mark the message as "dispatched" and "revert" to not dispatched when we are online again. But in that case, the user would not understand that the message wasn't dispatched, and would put back the phone in their pocket, while the message was never dispatched.
Sorry for the blurb, not sure it's clear :)
Quick question though, if I re-transmit, the id will be the same, would the message be propagated in this case?
My understanding is that you would need a new timestamp, which would mean that the messages are not dedupped since they are not equal at this point (hence the fact that retransmission can cause issues)
Good points though, will think more about it
My understanding is that you would need a new timestamp, which would mean that the messages are not dedupped since they are not equal at this point (hence the fact that retransmission can cause issues)
Note that the Message Unique ID schema I proposed does not consider the timestamp field (it is not hashed). So it supports retransmissions without affecting the ID.
I think libp2p Gossipsub is implemented thinking on the "ideal" scenario of cloud nodes with significantly high availability and bandwidth. Suppose we aim to increase the resiliency and robustness of Waku Relay. In that case, we need to spend some time understanding the scenarios and their requirements (e.g., a laptop connected to a wifi AP can be considered a mobile device).
Here I see two things:
Failure recovery, robustness and reliability often require extra bandwidth consumption. It is a toll to pay to have reliability. But different optimization strategies could be implemented (e.g., Gossipsub is already using RPC message piggybacking, NACKs, etc.).
P.S.: Adding an ACK is just a naive suggestion @richard-ramos and I discussed. Other strategies could be studied too.
Regarding this:
I am going to consider mobile for now, although I understand that we'll use filter/etc, but just to give a baseline, desktop might be different as more stable connections.
Note that there is a significant risk in using Waku Filter (a "server-push" protocol) in the mobile implementation for the same reasons we are discussing here (and many more). I already raised this concern in other forums.
When you get back online, query the store for these messages (need some form of unique ID)
The next evolution of the Waku Store and Waku Archive (based on the Message Unique ID) aims to provide a "client-pull" polling alternative solution to the Waku Filter protocol.
Regarding this:
I am going to consider mobile for now, although I understand that we'll use filter/etc, but just to give a baseline, desktop might be different as more stable connections.
Note that there is a significant risk in using Waku Filter (a "server-push" protocol) in the mobile implementation for the same reasons we are discussing here (and many more). I already raised this concern in other forums.
When you get back online, query the store for these messages (need some form of unique ID)
The next evolution of the Waku Store and Waku Archive (based on the Message Unique ID) aims to provide a "client-pull" polling alternative solution to the Waku Filter protocol.
I'd be interested in knowing a bit more about the concerns, do you have some links etc, otherwise maybe one day we can have a chat, if you don't mind thanks
Adding a ACK to libp2p-gossipsub seems very heavy indeed. As mentioned above, one could "request a ACK" but then this clearly reveal they are the original sender of the message and remove any privacy preservation Waku Relay may have.
What about just expecting to "see" the message within a given time frame.
For example, let's say we have a mesh with 8 peers. When sending a message, only send to 6 peers from the mesh.
Then, expect that within 5 seconds, one of two other peer should forward you the message.
If it it does not happen, then it would be safe to assume there was a transmission error and re-transmit.
Notes from 21 Feb call.
IHAVE
as a logical ack?
Messages that are broadcasted via gossipsub do not receive an ACK confirming when they're received by peers. This is problematic in the following situation:
In go-waku (and i imagine, nwaku too), the message is sent successfully. Pubsub has separate inbound and outbound streams for RPC messages, and do not acknowledge that it has received a message, meaning that up to the moment a TCP write timeout happens, we think messages were sent successfully (they're probably being buffered at OS level in the meantime)
In https://github.com/libp2p/specs/tree/master/pubsub#the-rpc we can see that communications happen by passing RPC messages, but it does not mention anything about ACKs of these messages (which could be as simple as passing a simple byte back indicating that the message was received)
The writing timeout can take some minutes to happen, so we instead rely on the keep-alive loop, which will ping the peers and use the failure to ping to know that the peer is disconnected, but this can take ~40s, so there will be a time period in which messages will be marked as sent incorrectly.
This issue was reported for mobile (which ideally should run filter and lightpush protocol instead), but network connectivity issues could happen on desktop too although less frequently.
Related issue: https://github.com/status-im/status-mobile/issues/14797
cc: @LNSD @kaiserd @Menduist @jm-clius @cammellos