ipfs / go-bitswap

The golang implementation of the bitswap protocol
MIT License
216 stars 112 forks source link

feat: coalesce and queue connection event handling #565

Closed Stebalien closed 2 years ago

Stebalien commented 2 years ago

Motivation

This change handles connectedness and responsiveness events asynchronously, coalescing multiple state-changes into one. In the past, this area has caused quite a bit of goroutine buildup as we fall behind processing connect and disconnect events.

Design

The design is a simple state-machine with the following components:

  1. A record of the current and "new" state for each peer.

  2. A queue of peers that have "changed".

  3. A worker that applies state transitions.

  4. When an event (disconnect, connect, unresponsive, responsive) happens, we record the desired (new) state for that peer and enqueue the peer for further processing by the background worker.

  5. The background worker takes peers off the background task queue, applying state-changes.

Importantly:

  1. There's only one background worker. Unfortunately, these state-changes take global locks so there's no point in attempting to parallelize.
  2. Multiple state-changes will be coalesced and only the final state-transition will be applied. The current system will apply all state transitions, even if the state-transition is out of date.
    • For example, the current system will go through the process of marking a peer as "connected", even after the peer has disconnected. The new system will coalesce these events, and drop the peer without further processing.
  3. Recording state transitions is really cheap (doesn't even use a channel). OnMessage is called all the time, so this is pretty important.

One drawback to not using a channel is that we have a global lock which may end up getting contended. However, we only hold the lock for very short periods of time, so this should be fine (testing TBD).

BigLep commented 2 years ago

This addresses https://github.com/ipfs/go-bitswap/issues/547 right?

Stebalien commented 2 years ago

Yes. Ideally we would be using the event bus, not callback-based notifications, but this "fixes" the issue (even if it doesn't pay down all the technical debt).

aschmahmann commented 2 years ago

Note: while this PR has a number of failing tests preliminary production tests have shown it works and unfortunately this repo has a lot of flaky tests that we need to deal with such as https://github.com/ipfs/go-libipfs/issues/86