libp2p / go-libp2p-kad-dht

A Kademlia DHT implementation on go-libp2p
https://github.com/libp2p/specs/tree/master/kad-dht
MIT License
519 stars 223 forks source link

MessageSender limits concurrent RPCs to remote peer by only keeping one single stream per peer #805

Open cortze opened 1 year ago

cortze commented 1 year ago

I've been conducting many concurrent CID Provide() and FindProviders() operations for RFM17 and RFM17.1 and I spotted a significant bottleneck in the current MessageSender's logic that affects the concurrency implementations on top of the go-libp2p-kad-dht.

As the messageSenderImpl suggests, The DHT implementation is limited to a single stream per peer.

// messageSenderImpl is responsible for sending requests and messages to peers efficiently, including reuse of streams.
// It also tracks metrics for sent requests and messages.
type messageSenderImpl struct {
    host      host.Host // the network services we need
    smlk      sync.Mutex
    strmap    map[peer.ID]*peerMessageSender
    protocols []protocol.ID
}

From a conversation I had with @mxinden about "Back Pressure" this logic seems wrong, as we should prioritize opening a new stream over sending multiple RPCs in the same stream. However, this is not even the case in the implementation. The current peerMessageSender only puts one single RPC at a time; therefore, multiple provide/lookup operations remain idle, waiting to get some "stream-time".

// peerMessageSender is responsible for sending requests and messages to a particular peer
type peerMessageSender struct {
    s  network.Stream
    r  msgio.ReadCloser
    lk internal.CtxMutex
    p  peer.ID
    m  *messageSenderImpl

    invalid   bool
    singleMes int
}

There is a significant margin for improvement if we come up with a better strategy to open/handle streams with remote IPFS nodes, so I open the Issue and the discussion to find the best way of improving it.

cc: @Jorropo @yiannisbot

burdiyan commented 7 months ago

My question is a bit unrelated, but it somewhat is :)

Why is it better to prioritize opening new streams vs sending multiple RPCs over the same stream? Is this a general recommendation, or it makes more sense in this particular scenario with the DHT?

Basically I would like to know the why of this:

From a conversation I had with @mxinden about "Back Pressure" this logic seems wrong, as we should prioritize opening a new stream over sending multiple RPCs in the same stream.

The reason I'm asking is because I've been looking for the right answer to this for years.

At Mintter we are using gRPC over libp2p, and we do always reuse the stream (which are wrapped inside the gRPC connection) per peer by caching the gRPC connections, and I always wondered whether this is the right thing to do. We didn't seem to have any issues with this specifically, so it remains this way to this day.

Jorropo commented 7 months ago

go-libp2p does not provide back-pressure on opening streams that means if go-libp2p-kad-dht opens too many streams it will error instead of waiting for in flight queries to complete. While it has backpressures on writes to streams (they block and wait until resources are freed).

The current way the code is setup ensures requests can wait on each other. However it is wastefull because it wont sends requests (which needs a responses) in parallel.

If we send a PUT which does not wait for a response then we take the lock write to the stream (which asynchronously send packets) and release the lock. However when we do a query the lock is taken and held until we read the answer, blocking future puts and queries one round trip at a time.

I attempted to fix that some time ago, it's buggy #962 and unfinished. This still use one stream however it allows to have more than one request in flight.