CRDT over IPLD performance

pgte commented 6 years ago

Goal:

Achieve the same or a better level of performance when using a CRDT over IPLD that we get from using with direct connections.

pgte commented 6 years ago

In previous iterations of CRDTs on top of IPFS, we were applying the Y.js protocol on top of the libp2p pubsub and libp2p multiplexed connections. After the initial sync, the data of the operation was pushed as a part of the broadcast message.

In this approach, instead of getting the changes pushed, each replica pulls the entries that aren't contained in the current HEAD, resolving it through IPLD. This is slow, specially because traversing a graph of missing entries may require many round-trips.

As a mitigation for this, instead of just broadcasting the current HEAD, replicas also broadcast the chain of parent nodes (up to X entries).

This is still a large bottleneck for this to work in soft real-time.

pgte commented 6 years ago

By disabling DHT (which, for our purposes so far, does nothing useful), we get a benchmark improvement (see this), but this still doesn't prevent a single get from taking around 10 seconds to resolve.

I have a strong suspicion (backed by some runtime stats I gathered) that this is due to the high number of peers that gets collected over time, resulting in many bitswap messages being received, clogging the network layer.

One way around this would be to disconnect from uninteresting peers, using an arbitrary heuristic.

In our case, the heuristic could be: "is this node participating in any of the CRDT networks we're in?". If not, the probability of disconnecting should be high.

One potential problem with this would be introduction to new CRDT networks. If the node is to participate in a new CRDT network, it must somehow find the other participating peers, which may be disjoint from peers the node is connected to. I think here that the (re-)discovery mechanism may help in reconnecting.

I feel that we need some exploratory work on this "manage connections" at a higher level than just bitswap or pubsub: a custom (libp2p?) module that plugs into libp2p, observes the behaviour, and manages connections using a custom heuristic.

@diasdavid any insight on this?

daviddias commented 6 years ago

but this still doesn't prevent a single get from taking around 10 seconds to resolve.

With a direct connection? Woa, something is wrong there. //cc @beanow and @ya7ya who have explored Bitswap perf in the past too

I have a strong suspicion (backed by some runtime stats I gathered) that this is due to the high number of peers that gets collected over time, resulting in many bitswap messages being received, clogging the network layer.

Very much likely.

One way around this would be to disconnect from uninteresting peers, using an arbitrary heuristic.

Agreed. We need the Connection Manager implemented for js-libp2p. go-libp2p currently limits the number of peers connected, however, for js-ipfs we will need something that the Application can hint which peers we are interested on.

Wanna take the lead on this work?

I feel that we need some exploratory work on this "manage connections" at a higher level than just bitswap or pubsub: a custom (libp2p?) module that plugs into libp2p, observes the behaviour, and manages connections using a custom heuristic.

It should be part of libp2p itself.

Another venue to explore is dropping some of those CPU bound operations to Web Workers through https://github.com/ipfs/js-ipfs/issues/1195

pgte commented 6 years ago

@diasdavid Yes, I'm glad to take the lead on this one :)

I think I'd like to solve this resource optimisation first and, like you said, then try to parallelise computation if still needed.

In the CRDT case, I need a module that will tell the application about these events:

peers joining
peers leaving
changes in resource usage for each peer (bandwidth, # of messages, ...)

The application will then have more data to compute the value that each node brings to it, on a scale of 1 to 0. A node with 1 as a value would mean that this peer is essential to the application while a value of 0 would mean that the peer is irrelevant to the application.

This value would give a probability of preemptive disconnection: 1 means that this peer should never be preemptively disconnected, while 0 means that it should be disconnected immediately.

The values between can be used to rank the peers to choose likely candidates to disconnect if a maximum hard number has been reached.

This maximum hard number can be configurable and be any of:

number of peers connected
bandwidth
... ?

In the CRDT land, we can use this to do resource throttling:

keep direct connections to nodes that are the source of a lot of data, while affording to disconnect from nodes that bring little to no value to the CRDT.
setting a maximum bandwidth we're willing to dedicate to this CRDT
setting a maximum number of peers we're willing to be connected to because of this CRDT

@diasdavid makes sense for an initial PoC? Anything else I should have in mind?

Beanow commented 6 years ago

I noticed similar slow retrieval even for localhost and lan peers. While I haven't had time to get to the bottom of this, my primary suspects were very aggressive peer discovery when webrtc/websockets star is enabled, and the chatter that creates potentially blocking other connections due to js-land crypto work piling up.

Be sure to check whether a large amount of dial attempts happen and how many peers are maintained. Maybe exponential backoff for dialing failures could be missing as well.

On the subject of priorities and heuristics. If webworkers for crypto needs to be a thing, sorting those workloads by relevance as well may be interesting.

pgte commented 6 years ago

@Beanow thanks, that's also on my usual suspects list, I'll be keeping an eye for peer discovery overhead. Once we have preemptive disconnects, that overhead should be easier to discern.

daviddias commented 6 years ago

@pgte go for it 👍

pgte commented 5 years ago

Peer-star-app deals with this by not relying in IPLD for real-time replication. May revisit this in the future if IPLD replication is better for Merkle-logs.

ipfs-inactive / dynamic-data-and-capabilities

CRDT over IPLD performance #3