Open dryajov opened 3 years ago
A couple of notes about QUIC. (My previous job was analysing QUIC :smile:)
QUIC is not lightweight compared with multiplexed channels over TCP. In some measurements, HTTP/3 over QUIC performs slightly worse than HTTP/2 over TLS over TCP. It's also relatively heavy on CPU, currently cannot use hardware offloading like TCP. It has advantages, with regard to HoL interference between channels, lower-latency initial connection, and some security properties, but lightweight compared with TCP isn't one of them.
QUIC multiplexes many channels over a single connection, same as multiplexing channels over TCP. A QUIC connection has its own identity, similar to the TCP 4-tuple (but it can migrate in QUIC). Also like TCP+TLS, QUIC connection setup uses an initial handshake for the whole connection, not per channel, and maintains connection like a TCP socket.
In the terminology of QUIC there are streams over a connection (in the spec). It's the same as what you mean here by channels muxed over a connection. The muxer in QUIC is even in application-level software, just as for muxing over TCP. Therefore, you might reconsider if it makes sense for the Peer Manager to represent QUIC significantly differently than TCP, i.e. with lots of "connections" per peer instead of few. Really, both of them have number of channels muxed over few or one connection per peer, and it's probably going to be similar numbers whether QUIC or TCP are used, so I think it makes sense for the Peer Manager to show comparable numbers.
I realised my point is bit unclear. It was meant to address "there should not be any distinction, but from the standpoint of peer and connection management there obviously is", where I thought from the wording that you're planning to represent mux-over-TCP as many "channels" over a "connection" and QUIC as many "connections".
Probably peer stats that are per-connection for TCP ("at the connection not channel level"), like latency, throughput, and whether to abort because it's too slow, should be per-shared-connection for QUIC as well.
At some level, I think there are three kinds of objects with state you must represent and might as well give a name to: Peers, connections and channels. Sometimes there will be situations where a channel is too slow or not responding, while the connection it's muxed over is fine and even carrying other channels ok.
Thanks for the feedback @jlokier.
Let me first address the overarching point of performance of TCP vs QUIC. I realize that QUIC lacks hardware support at any level and it can on the whole underperform TCP. However, one immediate gain of QUIC over TCP, that invalidates most of its current drawbacks is Head of Line Blocking
. Multiplexing over TCP is almost impossible because in practice you can only ever have one active (reading/writing) stream or channel per connection. So the only gain of multiplexing on top of TCP is during initial connection setup and you can expect the overall "speed" to be as fast as the slowest reader/writer.
QUIC multiplexes many channels over a single connection, same as multiplexing channels over TCP. A QUIC connection has its own identity, similar to the TCP 4-tuple (but it can migrate in QUIC). Also like TCP+TLS, QUIC connection setup uses an initial handshake for the whole connection, not per channel, and maintains connection like a TCP socket.
Yeah, this is one of the central points being made in the issue (perhaps a bit unclear at that). From the standpoint of libp2p, there are Transports
, Connections
, Muxers (Optional)
and Channels (Optional)
. QUIC multiplexes at the protocol (or from the perspective of libp2p transport) level, so no multiplexing is required on top of it.
In libp2p connection setup looks something like this:
+-----------------------------------------+
| Applications |
| +----------+ +----------+ +----------+ <-----------------|
| | PubSub | |NBC... | | Etc | |<------------- |
| | | | | | | <---------- | |
| +----------+ +----------+ +----------+ | | | |
| | | | |
+---^--^--^--------------------^--^--^----+ | | |
| | | | | | | | |
| | | Muxed Streams | | | | | |
| | | | | | | | |
+---|--|--|--------------------|--|--|----+ | | |
| Muxers | | | |
| +----------+ +----------+ +----------+ | | | |
| | Mplex | | Yamux | | Etc... | | QUIC Streams/Channels
| | | | | | | | | | |
| +----------+ +----------+ +----------+ | | | |
+------^--------------------------^-------+ | | |
| | | | |
| | | | |
Secured TCP Connection Secured WS Connection | | |
| | | | |
+------|--------------------------|-------+ | | |
| Encryption (Secure) | | | |
| +----------+ +----------+ +----------+ | | | |
| | Noise | |Secio(dep)| | Etc... | | | | |
| +----------+ +----------+ +----------+ | | | |
+------^-------------^--------------------+ | | |
| | | | |
TCP Connection WS Connection | | |
+-------|-------------|--------------------+ | | |
| Transports | | | | |
| +-----|-----+ +----|----+ +---------+ | | | |
| | | | | | --------------------------+
| | TCP | | Web | | QUIC---------------------+
| | | | Sockets | | ------------------+
| +-----------+ +---------+ +---------+ |
+------------------------------------------+
The flow for transports such as TCP and WS goes through the Secure
and then Mux
phases, the QUIC transport on the other hand produces streams that are directly usable by the high level protocols or applications. Note that it's possible to also encrypt/secure and/or mux QUIC streams, those are just steps in the flow.
Fundamentally, what this issue is trying to accomplish is to decouple this flow enough that it can be applied selectively to each transport.
It has advantages, with regard to HoL interference between channels, lower-latency initial connection, and some security properties, but lightweight compared with TCP isn't one of them.
I read this again and it seems like you address Head of Line
(HoL didn't click immediately). So I believe we're in agreement overall.
On that note, in your experience how much slower is quic and do you have some real world numbers?
The info out there is a bit scarce (and quite casual) but indicates that the difference is at best marginal? Here are links to some benchmarks and discussions I've been able to find:
In the terminology of QUIC there are streams over a connection (in the spec). It's the same as what you mean here by channels muxed over a connection. The muxer in QUIC is even in application-level software, just as for muxing over TCP. Therefore, you might reconsider if it makes sense for the Peer Manager to represent QUIC significantly differently than TCP, i.e. with lots of "connections" per peer instead of few. Really, both of them have number of channels muxed over few or one connection per peer, and it's probably going to be similar numbers whether QUIC or TCP are used, so I think it makes sense for the Peer Manager to show comparable numbers.
You're correct, I reviewed our QUIC implementation (this are somewhat older notes from before our own implementation took shape) and from the connection manager perspective it looks exactly the same, which is good because it simplifies the entire model quite a bit.
The point about decoupling the upgrade flow and allowing each transport to control it is still valid and should be addressed by the implementation. In other words, if we take the diagram from the above example, currently it isn't possible to bypass securing and muxin, so the flow depicted for QUIC isn't currently possible. The idea is to move most of the current logic implemented by the Switch
, which is precisely what the diagram depicts, to the transports to allow each transport drive it's own upgrade flow.
Some thought's & "planning":
Part I: create a peer book (done in #504)
Part II: implement the peer book in libp2p (#586)
Part III: serialization helpers
Part IV: I think it would be interesting to differentiate peer's infos coming from a reliable source (eg, identify
) and un-unreliable one (eg, discovery). Use the unreliable sources only when the reliable source is non existent or too old. Does this make sense?
Part V: per-book pruning of old, irrelevant data
TODO
libp2p-go
uses tags
, where each peer can be tagged, and each tag has a score
When the time to disconnect peers comes, it sums the tags
's scores
of each peer
to a peer score, and disconnect the peers with the lowest scores.
I find this interesting, so I extended on it:
Instead of tags, we create PeerGroup
s. Each PeerGroup
has a total score, which will be split between all the Peer
s of the group.
It will also have a low watermark, to make sure that a PeerGroup
with low peercount won't get peers disconnected.
When a group is low on peers, it will also call a user-supplied proc
, hoping to find new peers to fill the PeerGroup
.
Since the score of each group is split between each peer of the group, the peers in "rare" groups will naturally have a better score than peers in "common" groups.
And just like libp2p-go
, when we have too many opened connections, we'll start to trim the lowest-scored peers connections.
PeerGroup
. Should balance automatically nicelyPeerGroup
. Should balance automatically nicelyPeerGroup
with infinite score to make sure that the peers in this group are un-kickableIt's possible to update the parameters of a PeerGroup
on the fly, to reflect interest in different groups in realtime. (for instance, double the score or low watermark of each stability subnet depending on which subnet we'll soon participate in)
The score of each peer will also be split for each of his connection. A not-very-useful peer with many connections will probably drop to 1 or 0 connection, whereas a useful peer can have multiple connections.
A grace period should be given to new peers, and a temporary blacklist to kicked peers.
I think this is a good start of a generic peer management handling, and it should be fairly trivial to plug connection statistics when they'll be ready.
For NBC's peer cycling, we'll just have to feed to attestation subnet and ENR datas into the system and everything should work nicely. The only thing that should remain specific to NBC is the incoming/outgoing balance, that should be implemented in NBC directly.
For waku, @jm-clius does this make sense for you? Not sure if you need Peer management, but I'll gladly take your input!
cc @arnetheduck @dryajov
Yeah, this does sound like an interesting idea, and although it's early days, it's likely to be useful for Waku once we productionise. Thanks!
A question and a comment from my side:
PeerGroup
s as possible? (e.g. PubSub peers subscribed to as many pubsub topics as possible)?filter
and store
peers), generally with a connection per protocol. I can imagine a peer group per protocol here, but you may want to keep this use case in mind.By the way, as it stands a disconnecting PubSub peer will have all its connections (pubsub-related or not) dropped from the switch
. We're looking forward to have this fixed by proper peer management.
PeerGroup
, but each PeerGroup
has 20 peers will have a lower score than a peer in a single PeerGroup
with 2 peers in it.By the way, as it stands a disconnecting PubSub peer will have all its connections (pubsub-related or not) dropped from the switch. We're looking forward to have this fixed by proper peer management.
You mean when disconnectBadPeers
is set to true?
But anyway yes, the protocols should have less control by default over the peers/connection, and should just have control over streams
But obviously, if the connection manager will completely disconnect low scored peers to keep the connection count <= max_conn
Thanks for clarifying!
You mean when disconnectBadPeers is set to true?
Ah, to be clearer: it disconnects the peer from the switch when failing to connect as a pubsub peer, because of this dropConn
logic.
I think this is a good start of a generic peer management handling,
What I'd really like to see first is a prototype that's done in nbc which causes nbc to work exactly like the spec and the nbc subnet walking / selection feature demands - once that prototype is done, it can be generalized into libp2p peer management with fancy groups etc - the point here is to start with a concrete implementation first and then generalize where there is opportunity to do so, in order to avoid two things: over-generalization and mismatch in requirements leading to a sub-par implementation in NBC (which is the current state: the requirements in nbc don't match what libp2p has implemented and libp2p is more general than nbc needs).
The generic peer management is a second step in that process, in other words.
What's important to remember is that when NBC looks at a peer, it does not care at about its gossipsub score when deciding if it's a viable and interesting peer, for example - this is not a "weighted sum" feature where a generic peer handler can sum up the scores and yield a reasonable behavior, but rather multiple partially independent views.
Rather, there are criteria that must be fulfilled, criteria that would be nice (good gossip score), and criteria that completely disqualify a peer (abusive behaviour).
It's also important to remember that the go implementation has grown over time and accumulated a lot of cruft as well as API debt - in general the rust implementation is newer and closer to what we're looking for, and in particular the branch / version that lighthouse uses.
By the way, as it stands a disconnecting PubSub peer will have all its connections (pubsub-related or not) dropped from the switch. We're looking forward to have this fixed by proper peer management.
I'm not sure about "proper" because that depends on the situation but there are many cases where completely dropping a peer is the right course of action - separately from that, it should be possible to disconnect individual protocols, but this is not a case of peer management: it's a case of protocol management.
The final point is that in the case of nbc, the peer management should be lazy - for "normal" / non-abusive peers, it's better to be connected to them than to not be connected to them because they might be interesting in the future - but there comes a point where the least useful peers need to go in order to make room for the more useful once.
Finally, what is useful and not depends on polling inside of NBC using the eth2-specific status and metadata requests as well as the sync state - this is where the use case of protcol-specifc per-peer metadata comes in - ie when peers are connected in libp2p protocols like eth2 need to be able to listen to events and set up their own per-peer flows that don't readily lend themselves to generalization - ideally libp2p should provide events and extension points rather than trying to implement complex logic itself.
The above point about eth2 needing eth2-specific management of peers requiring events and extension points is also what nimbus-eth1 will shortly use.
(Note: Eth1 doesn't use libp2p - only devp2p/discv4 at the moment, and perhaps discv5 further on. Writing this here because I think this aspect of its relationship with the protocol layer seems worth describing. We don't know if eth1 will evolve to use libp2p to sync, and even if it doesn't, I think this adds to @arnetheduck's argument that libp2p should provide events and extension points, not just trying to implement all the peer management logic itself using information it doesn't have.)
nimbus-eth1 will maintain its own peer state objects, separate from the devp2p protocol peer object. This contains eth1-specific peer state including some reputation aspects (bad peers, wrong chainid, slow responders) as well as syncing parameters, knowledge of the peer's knowledge (what we shouldn't send because it already knows, what we shouldn't ask for again because it doesn't know), queue lengths and response times.
The eth1 peer object is created when the protocol layer adds a peer, but when the protocol removes a peer (e.g. disconnected), eth1 keeps its peer object around for some time afterwards, in order to use that knowledge if it reconnects later, or if discovery rediscovers the same peer. Also so there are a few backup peers, ordered by eth1 reputation, in case some disconnect. In addition, persistent static nodes, trusted nodes, etc are entries in the eth1 peer state which aren't in the protocol's peer table. The eth1 peer state affects some new connections the protocol layer would like to make, so this interaction is not entirely in one direction. We may eventually save this table to the local database, so that fast restarts do not need to perform the entire discovery, ranking and syncing process from scratch.
Although some of this code may work its way back to the protocol layer implementation when it has been proven to work on the eth1 network, much of the detail is eth1 specific and is likely to remain so. It's also architecturally helpful for eth1 to be able to keep an object independent of when the protocol layer decides to free one. Some of the logic around persistent reputation and performance ranking is quite specific to eth1 - the protocol layer has no direct visibility on this, because it can't evaluate what are "good" responses and what are bad but valid-looking; it takes an eth1 engine to evaluate that.
Peer and Connection Management Notes
Peer Book
The peer book stores serializable information about a peer.
(@jm-clius worked on the peer book implementation in - https://github.com/status-im/nim-libp2p/issues/504) (reference go and js implementations )
Peer Manager
Closely related to Peer Book, but not the same thing. This is more concerned with managing connected peers. Overall, I see this as the action of ranking peers with active connection, keeping limits and connecting/disconnecting on demand.
The main thing to understand with peer management is that there should be a balance between allowing applications to control who they are connected too, without affecting other applications (ie protocols). This is a bit more complex than the peer book, which is more concerned with keeping up to date information about a peer. We can use the Peer Book for storing this information such as connection related stats, scores, etc...
It's worth stressing the difference between Peers and Connections. A Peer can have several Connections and also, connections are transport dependent. Some transports are stream oriented, with muxers that produce channels on top of them - like in the case of TCP and mplex. Others are lightweight connections built on top of UDP - like in the case of uTP or quic. In any case, from the application standpoint, there should not be any distinction, but from the standpoint of peer and connection management there obviously is.
This calls to make a distinction between closing connections vs disconnecting peers. Closing a connection would not drop the peer from the peer manager, ie all other protocols with connections to that peer would keep those connection. However, dropping the peer means closing all the connections for that peer and removing from the peer manager.
With this in mind, there are several related counts that need to be kept
This distinctions are important, because some transports,will have hundreds of connections for a peer - as it is the case of quic. While for others, will have at most a few - as in the case of TCP streams, with many muxed streams on top. Each of this counts needs to be kept separately, but each will eventually affect the peer's score, hence it's ability to stay connected or be dialed in a near future.
So the peer manager will have to take all of that into account, which isn't the case right now. For starters, muxers shouldn't even be part of the peer manager.
So, back to managing connected peers - what do we really want from it? Some high level thoughts:
Dialer/Stream Provider
interfaceWhat needs to happen to make this possible?