vacp2p / nim-libp2p

libp2p implementation in Nim
https://vacp2p.github.io/nim-libp2p/docs/
MIT License
240 stars 52 forks source link

IHave / IWant message behavior seriously impacts large message transmissions. #1101

Open ufarooqstatus opened 1 month ago

ufarooqstatus commented 1 month ago

IHave / IWant message behavior seriously impacts large message transmissions.

In a realistic network scenario (with peers having dissimilar bandwidth/latencies), some peers receive messages much earlier than many other peers in the network (The time difference increases with the message size).

The early receivers spread Ihave messages. On getting an ihave all remaining peers immidiately send iwant message.

This behavior has some inherent problems (resulting in significantly higher network-wide message dissemination time/fluctuations)

1) On receiving an ihave, a peer only checks for seen cache, peer score, and ihave budget. So potentially, a peer can issue multiple iwant messages against the same message ID to different peers. 2) Large message reception may consume significant time, so a peer can issue an iwant message even if it is already receiving the same message 3) The peer issuing ihave message will have to service (send message to) many peers in addition to its mesh (as we only check peer score, msgID_Validation, and single transmission to any peer).

The following results compare gossipsub (current arrangement) with and without IWant messages in a 1000-node network. The bandwidth/latency are uniformly distributed between 50-150Mbps and 40-160ms in 4 stages. The results are averaged for 10 message transmissions (with two initial warm-up messages to raise Cwnd at peers. These warm up messages are not included in results)

image

ufarooqstatus commented 1 month ago

Obviously, disabling IWant messages is not the right choice. However, some possible work arounds can be to

1) Send only 1 IWant message against any msg ID 2) For large messages, a peer should send IHave only after it has relayed the message to all (or most of its mesh members) 3) A peer should not issue IWant message, against a message it is already receiving (will require knowledge of in-flight receives)

Moreover, a message as large as 1MB, will consume much higher time at a newly established connection (congestion avoidance mechanisms) compared to a warm one!

kaiserd commented 1 month ago

related: https://github.com/vacp2p/nim-libp2p/issues/868 #976

kaiserd commented 1 month ago

Does this behaviour contradict the spec? How do other impls handle that?

ppopth commented 1 month ago

I think the purpose of IWANT is to make sure that the message is propagated across all the nodes. Without IWANT, some nodes will not receive the message, if the mesh graph is not connected.

In theory, the IHAVE/IWANT messages should also make the propagation time lower because they provide shortcuts for the messages to travel. However, IHAVE messages will be sent only at the heartbeats and the heartbeat interval is 1s (at least in the configuration of go-libp2p-pubsub, I don't know about nim-libp2p). Since the heartbeat interval is too high for your experiment, I'm not surprised that the propagation time will be lower for your "without IWANT" experiment.

I think, in order to have a fairer experiment, you should configure a lower heartbeat interval and do the experiments again (or try to make the propagation time a lot higher than 1s for the "without IWANT" experiment).

ufarooqstatus commented 1 month ago

Does this behaviour contradict the spec? How do other impls handle that?

Only one difference: v1.2 allows cancelling outstanding IWANT request(s) by issuing IDONTWANT

Secondly, its not mentioned if we can issue multiple IWANTs for the same message ID. However, go-libp2p allows 3 IWANTs for the same msgID to the same peer. So generally, it seems that we can send multiple IWANTs for the same message ID

How do other impls handle that?

Mainly similar process, other than the peer behavior penalty (not related to this issue). go-libp2p and rust-libp2p probablistically apply penalities for unanswered IWANTs

ufarooqstatus commented 1 month ago

I think the purpose of IWANT is to make sure that the message is propagated across all the nodes. Without IWANT, some nodes will not receive the message, if the mesh graph is not connected.

Yes, IHAVEs/IWANTs are essential, Its just that in the case of very large messages we need quiet some time for the message to get delivered. So in most of the cases, a peer issues IWANT even if it is already receiving the same message.

One possible solution is to check for received IDONTWANTs for the same msgID (advertised in IHAVE). If such msgID is encountered in IDONTWANTs, we can at-least defer sending of IWANT message.

Second potential problem is the peer sending IHAVE has to deliver the message to its mesh as well. delaying message transmission to mesh members will result in lower score. so such peer should service IWANTs only after it has serviced its mesh members (and it is very likely that the peer issuing IWANT gets the same message from some of its mesh members, so eventually, such IWANT can be cancelled

ufarooqstatus commented 1 month ago

In theory, the IHAVE/IWANT messages should also make the propagation time lower because they provide shortcuts for the messages to travel. However, IHAVE messages will be sent only at the heartbeats and the heartbeat interval is 1s (at least in the configuration of go-libp2p-pubsub, I don't know about nim-libp2p). Since the heartbeat interval is too high for your experiment, I'm not surprised that the propagation time will be lower for your "without IWANT" experiment.

Yes, for a reasonable size message it does lower the message dissemination time (even for a 200KB message size in the above results, IWANTs help achieve lower latency)

I think, in order to have a fairer experiment, you should configure a lower heartbeat interval and do the experiments again (or try to make the propagation time a lot higher than 1s for the "without IWANT" experiment).

I have conducted these experiments with heartbeat interval = 1 sec, gossip factor = 0.25, D =8, D_Low=6, D_High=12. All other settings are mentioned in the results above

image