Improved Performance in go-libp2p-kad-dht: New Feature Reduces PUT/Provide Latencies to <1s

dennis-tra commented 1 year ago

Hi Indra Devs,

This is Dennis from Protocol Labs. I wanted to inform you about an exciting update regarding the go-libp2p-kad-dht implementation, which I believe you are currently using. As part of our continuous efforts to enhance performance, we have introduced a new experimental feature that significantly improves the PUT/Provide latencies.

Previously, the DHT's PUT performance was known to be rather sluggish, often taking more than 10 seconds or even minutes. However, with the latest release of go-libp2p-kad-dht >v0.23.0, we have introduced a feature called "Optimistic Provide" [Kubo documentation]. This feature has demonstrated great results, bringing the latencies down to less than 1 second in the 50th percentile and less than 1.4 seconds in the 90th percentile.

While I'm unaware of the specific requirements of your use case or whether the previous latencies posed any challenges for you, I wanted to ensure that you are aware of this improvement.

If you have any questions or would like more information about this new feature, please don't hesitate to reach out.

Cheers, Dennis

l0k18 commented 1 year ago

I might be. It's using the seemingly unloved badger backed peerstore because I do need persistence of the data of peers due to the clients holding sessions with them. They are gossiping the updates using the Peerstore methods inside the libp2p Host.

This is nice news, it means that I should add node load level to the messages as that's pretty much realtime for preventing gridlock with a suitable peer selection algorithm for routing, doesn't really need to be better than that if the distribution is random but avoids reported high load.

l0k18 commented 1 year ago

For other readers and dev notes, the dht is used in pkg/engine/transport/discovery.go.

I am enabling this feature as I have some idea that it is a gossip or epidemic propagation that leverages parallelism available in a p2p network. If it seems to not be working or if we encounter bugs that go away by disabling it we will report it.

Most probably it will become a configuration option in the relay engine code, but on the face of it, I am very much interested to see how this works because low latency broadcast, that is semi-reliable, really can help with nodes learning important information about the state of the network. Shorter update cycles on loading will dramatically reduce the incidence of congestion.

Indra aims to be as synchronous as is possible, and to do so wherever possible using entropy and the special features of cryptographic hashes as we see with kademlia, to help this goal. The dispatcher and packet split/join also has a message ack latency response system that raises the parity shards to data shards ratio in response to a message delivery time that exceeds the bandwidth time cost compared with the current ping between the peers (1 ping is the time of a successful send of one packet, so if the time between the last packet out and the ack is 2x or more it flags that as a TCP retransmit and ups the resilience for the next round until it stays within the 1-2 ping time range of last to received delay.

Having the ability to consistently update the utilisation state of relays will be key to achieving the capability of low latency traffic with anonymity. Mixnets are easy to make secure from metadata analysis if the latency is high but when it is low, it gets more challenging and the fog of war otherwise obscures the fluctuations of relay loading on the close by timescales to the latency target.

The need for dummy traffic and the recognition of the higher security in slow relayed messages also will be weighed in for making either features or suggestions in our onion API/SDK that is now in the first stage of development. Fork and join commands in the onions also will have great use in mitigating tx failure as well as cheaply increasing reliability to provide low latency with less or no anonymity guarantee, as direct proxying is also a use case we want to encourage, as all traffic enables in band monetisation and metering.

l0k18 commented 1 year ago

@dennis-tra tagging you because I want you to know, and to report that by enabling that option my test of the gossip propagation of the peer advertisments immediately started working.

Without that option, the test showed that only the peers were hearing their own messages (todo: fix that), now it looks like aside from the late comers not getting the early comers first dispatch, everyone is sharing it with everyone (on loopback, ofc).

Even if there is some wrinkles later on due to the age and work done testing it I hope this thing becomes normal because it is ... wow.

dennis-tra commented 1 year ago

Hey @l0k18, thanks for the feedback! Great news 👍 Do you have documentation on how exactly you implemented peer advertisements? I'm curious to read up details. Anyways, ping me again about your experiences if you think there's something to report.

indra-labs / indra

Improved Performance in go-libp2p-kad-dht: New Feature Reduces PUT/Provide Latencies to <1s #4