Open Stebalien opened 6 years ago
The connection manager needs a lot of love. I suspect it’s a strong culprit in some of the erratic behaviour we see. Indeed we need better semantics and feedback loops between the connection manager and the streams.
This is a P1 for me this Q4, and I’ve been evaluating different approaches. My conclusions so far:
Streams should be able to lock the connection for a specific time window (with maximum bounds). They’d register a callback function that gets notified when the lock elapses and the conn manager is about to shut the connection. They can extend the lock (with some limit), or take a compensating action, e.g. initiate a connection with another bitswap peer.
The scoring/weighting system is too simplistic. Instead of the current read-write tag system, my current thinking involves protocols being able to attach private structs that implement a Score() int64
interface. The structs can track latencies, error rates, usefulness, etc. and other protocol-specific stuff. The Score()
function returns a score 0-1000 applying whatever formula makes sense for the protocol. When the connection manager starts a round of pruning, it calls Score()
on all attached structs and picks the lowest scoring ones. This pull mechanism is more efficient, cleaner and contextual than the current one.
I’ll try to put together an interface with these ideas and yours in the next day, if that works.
P.S.: Written on mobile, so excuse any sloppiness.
The concept of decaying would be part of point 2. It’s going to look different for each protocol, so I’d stay away of modelling this on the connection manager, and instead lean towards protocols tracking peer usefulness and liveliness in the scoring struct, and using that value in the scoring function.
This is a P1 for me this Q4
Awesome!
Streams should be able to lock the connection for a specific time window (with maximum bounds). They’d register a callback function that gets notified when the lock elapses and the conn manager is about to shut the connection. They can extend the lock (with some limit), or take a compensating action, e.g. initiate a connection with another bitswap peer.
Really, bitswap and friends should be setting write deadlines. What was your use-case for the time lock?
A "may I delete this" callback would be awesome for the DHT. That is, the DHT will be able to stop the connection manager from completely emptying a routing bucket.
Instead of the current read-write tag system, my current thinking involves protocols being able to attach private structs that implement a Score() int64 interface.
To connections or streams?
So, I started this off the wrong way. Let's talk use-cases:
DHT
Currently, the DHT leaves streams open (because negotiating a stream is expensive compared to a single request). With multistream 2.0 (or any sane multistream replacement), this should be pretty much free.
So, assuming we open a new stream for every request:
Ping
Ping only cares about a connection when a stream is open.
Identify
Really, identify doesn't care at all. Identify would want a way to create a stream and immediately release the lock.
Relay
Relay start/stop will want to lock the connection open. However, relay hop will need some metric (based on the number of connections being relayed). Ideally, as you say, this would be computed on-demand.
Bitswap
So, we this is a case where we might want to open a stream to track the session. On the other hand, I believe the original vision for bitswap had it keep sessions open even through disconnects...
Otherwise,
ipfs p2p
pubsub
This depends on the protocol but, e.g., in gossipsub, we want to lock open the connections to our grafted peers and try to keep open connections to gossip peers.
From this, I agree with your "associate a custom struct" approach. Really, I'd like to be able to associate a struct with a peer, a connection, or a stream.
Actually, the custom struct approach is probably sufficient. That is, if we provide some way to access/modify these scorers and provide some sentinel priority that locks the connection open, we should be able to implement everything without having to put too much logic into the connection manager itself.
For example, take the following reflection happy version:
type StreamScorer interface {
ScoreStream(c Stream) int
}
type StreamScorer interface {
ScoreConn(c Conn) int
}
type PeerScorer interface {
ScoreConn(p peer.ID) int
}
type ConnectionManager interface {
Scorer
// ManageConn takes a connection to manage and a callback to update
// a scorer associated with the connection.
//
// fn is a *function* of the form `func(*MyScorer) { ... }`
//
// While called, it holds an exclusive lock on the scorer and can
// manipulate it freely.
//
// If a scorer of the appropriate type has not yet been associated with
// the connection, a new one with a zero value will be created.
ManageConn(c Conn, fn interface{})
ManageStream(s Stream, fn interface{})
ManagePeer(p peer.ID, fn interface{})
}
Internally, this would keep a map of ScorerType -> Scorer per connection, stream, peer. Technically, we could go full recursion and associate a scorer-scorer with the network, have it internally track per-conn scorers that track per stream scorers all the way down... but let's not.
Whatever, that's probably going too far off into reflection land. However, this does show that the you're right, the pull approach is probably the right one.
We also have to consider the kind of complexity we are introducing: time locking. callbacks, oh my. Let's not make it impossible to program with sanely.
Can't we make it work with just weights? These are nice and simple.
The problem with just weights is that we have to update them eagerly. This is currently a real pain to manage properly.
I agree we want to keep it as simple as possible (at the core at least). Registering stream "scorers" should keep it reasonably simple while making it flexible.
Note: There are definitely simpler mechanisms than my (insane) reflection proposal. A simple alternative is to:
That is,
type ConnManager interface {
RegisterStreamScorer(StreamScorer)
// ...
}
type Stream interface {
// ...
SetMeta(key, value)
}
Really, we don't even have to provide a way to associate metadata with streams. It's just easier than having everyone independently track streams using open/close events.
The problem with just weights is that we have to update them eagerly. This is currently a real pain to manage properly.
To make this less of a blank assertion, I ran into two cases where the current connection manager interface frustrated me to the point where I just gave up:
@vyzo but really, thank you for pushing for the simple solution and please continue to do that whenever I try to float off into the meta-programming abstraction ether.
Suggestion, Connections should be divided into two categories
One type is the connection managed by the existing conn-manager.
The other type is a user-defined connection, which should have the following properties.
Business applications typically have their own stable nodes, so it's best to have a way to maintain a stable connection between these nodes, and existing connection manager-managed connections can help these fixed nodes provide services.
@thomas92911 that's the idea behind tagging. You can tag a connection with a really high weight to keep it (although we should probably add a "don't close ever" tag).
@thomas92911 that's the idea behind tagging. You can tag a connection with a really high weight to keep it (although we should probably add a "don't close ever" tag).
that's great.
Consider this algorithm. It reduces the problem to the desired goal: What connection to evict when we have to evict. I still think that "scoring" connections is adding extra state and concepts that are superfluous to this goal. I made the algorithm a gist in case people want to pick out issues or concerns. Some advantages of a stateless algorithm include: significantly less complexity, and significantly more testability. It's also much easier to manipulate the parameters that make the decisions.
The current connection manager is keeping our nodes from running out of memory but is breaking everything else pretty badly. Unfortunately, it's rather tricky to correctly use the current interface.
I'd like to do the following:
const IMPORTANT = 100
etc.Anyone have bandwidth for this? It'll take a bit of work to get it right but it should help immensely.