paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.81k stars 660 forks source link

Tracking issue for QUIC support #536

Open kpp opened 3 years ago

kpp commented 3 years ago

There are a couple of reasons to support QUIC network protocol.

The first, slightly minor, is that validators are maintaining more than 1000 TCP connections open and are running into file descriptor limits. Second is we use Yamux for multiplexing multiple substreams into one, which is a rather poor protocol that does the best it can on top of TCP, but can't do more. Because we can't know the TCP window size from user space, we can't actually properly allocate window sizes to individual substreams. If two or more substreams send a big volume of data, we'll run into head of line blocking issues (This is the same reason why HTTP2 is not so great and is being replaced with HTTP3, if you want to read more) Third reason is we're suffering from a TCP slow start issue. A TCP connection is almost unused, and suddenly we send for example 2 MiB on it. Because of TCP slow start, it's going to take a rather long time to send these 2 MiBs even if the connection would be capable of sending them quickly.

First attempt was made in https://github.com/paritytech/substrate/pull/6366.

So far there is https://crates.io/crates/libp2p-quic which should be reviewed.

The list of sub issues will be updated time to time:

burdges commented 3 years ago

Yes. QUIC would improve our networking significantly. :)

We previously got hung up on keeping Noise IK via nQUIC = QUIC + Noise vs adopting QUIC with TLS 1.3 by defining some new grandpa TLS certificate for validators, crazier certificates for collators, and maybe others.

I'm completely happy outsourcing an initial spec for one or both of these. It's also fine if we do something fast and warn everyone to expect yet another network protocol migration. It's conversely also possible that @tomaka knows some wish-like of libp2p messes that would be good to bundle in with this big a migration.

I'd expect a "grandpa TLS certificate" for validators to consist of a Merkle proof identifying the Ed25519 grandpa session keys, and then a certificate by the session key on the transport key. We should first migrate storage from our radix 16 hashing, which bloats in a dense Merkle tree, to radix 2 hashing with hash fast forwarding (and radix 16 caching). We'll want crazy certificates for collators, like VRFs for parachains running Babe and worse for Sassafras in future.

I've no idea if we properly exploit Noise IK yet anyways, meaning whether nodes initiating connections identify themselves first via something vaguely like this "grandpa TLS certificate". It's annoying if the Merkle proof that shows your actually a validator does not fit into the MTU used for the first datagram, so that 4x space cost of our radix 16 hashing matters here too.

In brief, we need to decide how quickly we want this and what we want to try to rush to merge into such a migration. We'll surely punt on doing authentication perfectly optimal initially, but we're slowly creating the possibility for better authentication elsewhere so it'll circle back here one day. :)

burdges commented 3 years ago

I think the simplest would be QUIC with TLS 1.3 and no authentication, but authentication done late inside the stream. We simultaneously give someone reputable a grant for designing nQUIC properly. And push our storage folk to fix our radix 16 hashing issue.

tomaka commented 3 years ago

I'd strongly prefer for us to implement something that is already specified (i.e. TLS+QUIC) and not start creating parity-nquic that will get abandoned after 6 months.

kpp commented 3 years ago

So far I did a research and implemented: 1) a go tool to generate an x509 certificate with libp2p extension in DER format (the code is mostly a copypasta from https://github.com/libp2p/go-libp2p-tls); 2) a rust tool to parse, inspect and verify the signature according to libp2p docs.

The reason I did it was I wanted to be compatible with the go implementation of libp2p-quic. In the future I want to test our client with their server and vice versa.

https://gist.github.com/kpp/c9c84411e17f4b27dddf0d438b289862

The code is ugly but it's not the point. The point is that the rust tool is binary compatible.

The next step is to prettify the code, implement a certificate serializer and create a PR.

Also I found a tool in the Internets to read DER files which helped me a lot to inspect the certificate: https://lapo.it/asn1js.

kpp commented 3 years ago

So far I:

kpp commented 3 years ago

These two weeks I:

kpp commented 3 years ago

While I was playing with quinn_proto, David added TLS support into his libp2p-quic crate. We had a chat with Pierre and came to a conclusion the code is a good starting point. There are some issues left but so far it looks pretty good.

Since the last time I worked on multiple issues:

The current progress is: we are integrating libp2p-quic into rust-libp2p: https://github.com/libp2p/rust-libp2p/pull/2159.

kpp commented 2 years ago

dvc94ch dropped, so I opened my own PR: https://github.com/libp2p/rust-libp2p/pull/2289

Here are bench results:

Bench results ``` Local-Local TCP # Start Rust and Golang servers. # Rust -> Rust ## Transport security noise Interval Transfer Bandwidth 0 s - 10.00 s 5913 MBytes 4730.32 MBit/s ## Transport security plaintext Interval Transfer Bandwidth 0 s - 10.00 s 9203 MBytes 7362.40 MBit/s # Rust -> Golang ## Transport security noise Interval Transfer Bandwidth 0 s - 10.00 s 5458 MBytes 4366.22 MBit/s ## Transport security plaintext Interval Transfer Bandwidth 0 s - 10.00 s 10284 MBytes 8227.13 MBit/s # Golang -> Rust ## Transport security noise Interval Transfer Bandwidth 0s - 10.00 s 6880 MBytes 5502.57 MBit/s ## Transport security plaintext Interval Transfer Bandwidth 0s - 10.00 s 17534 MBytes 14026.69 MBit/s # Golang -> Golang ## Transport security noise Interval Transfer Bandwidth 0s - 10.00 s 4881 MBytes 3904.79 MBit/s ## Transport security plaintext Interval Transfer Bandwidth 0s - 10.00 s 23115 MBytes 18489.50 MBit/s Local-Local QUIC: # Start Rust and Golang servers. # Rust -> Rust ## Transport security noise Local peer id: PeerId("12D3KooWAEgNvJB6tXtmpgjf2GDZjMhCneZvtUo2KTR4if4kkoH7") Interval Transfer Bandwidth 0 s - 10.00 s 361 MBytes 288.72 MBit/s ## Transport security plaintext Local peer id: PeerId("12D3KooWHiYCM8HrETLwqDwAwo3ovHWgAo1astGG2bLJ7VdydA95") Interval Transfer Bandwidth 0 s - 10.00 s 381 MBytes 304.77 MBit/s # Golang -> Golang ## Transport security noise 2021/12/09 14:35:18 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details. Interval Transfer Bandwidth 0s - 10.00 s 725 MBytes 579.98 MBit/s ## Transport security plaintext 2021/12/09 14:35:28 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details. Interval Transfer Bandwidth 0s - 10.00 s 724 MBytes 579.09 MBit/s Local-Server TCP (mixed plaintext/noise): # Rust -> Rust ## Transport security noise Interval Transfer Bandwidth 0 s - 10.02 s 126 MBytes 100.64 MBit/s ## Transport security plaintext Interval Transfer Bandwidth 0 s - 10.09 s 125 MBytes 99.08 MBit/s # Rust -> Golang ## Transport security noise Interval Transfer Bandwidth 0 s - 10.21 s 111 MBytes 86.97 MBit/s # Golang -> Rust ## Transport security noise Interval Transfer Bandwidth 0s - 10.03 s 129 MBytes 102.92 MBit/s ## Transport security plaintext Interval Transfer Bandwidth 0s - 10.01 s 125 MBytes 99.85 MBit/s # Golang -> Golang ## Transport security noise Interval Transfer Bandwidth 0s - 10.15 s 89 MBytes 70.15 MBit/s Local-Server QUIC: # Rust -> Rust ## Transport security noise Local peer id: PeerId("12D3KooWR1KRW9UoJd8XXwFLKWdvmC4yTEDgStvXJGUD3ZFnFZDw") Interval Transfer Bandwidth 0 s - 10.01 s 9 MBytes 7.19 MBit/s ## Transport security plaintext Local peer id: PeerId("12D3KooWA7vqQpUV3SxUNTWWeWHiM2N1WEebMPETwhLLtmD6QdnL") Interval Transfer Bandwidth 0 s - 10.04 s 4 MBytes 3.19 MBit/s # Golang -> Golang ## Transport security noise 2021/12/09 17:42:04 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details. Interval Transfer Bandwidth 0s - 10.01 s 109 MBytes 87.12 MBit/s ## Transport security plaintext 2021/12/09 17:42:14 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details. Interval Transfer Bandwidth 0s - 10.01 s 121 MBytes 96.71 MBit/s ```

Also we still cannot connect to go-libp2p-quic. I am working on it.

burdges commented 2 years ago

It's QUIC with QUIC's standard TLS 1.3 then? Cool we reinvent enough around here anyways. ;)

kpp commented 2 years ago

I discovered that I pollute Swarm threads with StreamMuxer::poll_event with:

let span = tracing::span!(tracing::Level::TRACE, "handle_event").entered();
while let Poll::Ready(event) = inner.endpoint.poll_channel_events(cx) {
    inner.connection.handle_event(event);
}
drop(span);

let span = tracing::span!(tracing::Level::TRACE, "send_transmit").entered();
let max_datagrams = inner.endpoint.max_datagrams();
while let Some(transmit) = inner.connection.poll_transmit(now, max_datagrams) {
    inner.endpoint.send_transmit(transmit);
}
drop(span);

And they are responsible for parsing packets and building packets respectively. I believe that's the main issue for poor performance. tracing.folded.txt

kpp commented 2 years ago

The current state of this issue:

kpp commented 1 year ago
burdges commented 1 year ago

Any idea what the performance looks like?

kpp commented 1 year ago

I am working on it.

kpp commented 1 year ago

https://github.com/libp2p/rust-libp2p/pull/2289 is merged.

kpp commented 1 year ago

@burdges

localhost: QUIC with libp2p-perf ``` $ ./run.sh # Start Rust and Golang servers. Local peer id: PeerId("Qmcqq9TFaYbb94uwdER1BXyGfCFY4Bb1gKozxNyVvLvTSw") about to listen on "/ip4/127.0.0.1/udp/9992/quic" Listening on "/ip4/127.0.0.1/udp/9992/quic". # Rust -> Rust Local peer id: PeerId("12D3KooWH1oPSwuvp5v5AUqvGT3HR64YZRC7VZmhVZkewiLcPDpa") IncomingConnection { local_addr: "/ip4/127.0.0.1/udp/9992/quic", send_back_addr: "/ip4/127.0.0.1/udp/54502/quic" } ConnectionEstablished { peer_id: PeerId("12D3KooWH1oPSwuvp5v5AUqvGT3HR64YZRC7VZmhVZkewiLcPDpa"), endpoint: Listener { local_addr: "/ip4/127.0.0.1/udp/9992/quic", send_back_addr: "/ip4/127.0.0.1/udp/54502/quic" }, num_established: 1, concurrent_dial_errors: None } Behaviour(PerfRunDone(10.001692342s, 4396669396)) Interval Transfer Bandwidth 0 s - 10.00 s 4396 MBytes 3516.68 MBit/s ConnectionClosed { peer_id: PeerId("Qmcqq9TFaYbb94uwdER1BXyGfCFY4Bb1gKozxNyVvLvTSw"), endpoint: Dialer { address: "/ip4/127.0.0.1/udp/9992/quic", role_override: Dialer }, num_established: 0, cause: None } ConnectionClosed { peer_id: PeerId("12D3KooWH1oPSwuvp5v5AUqvGT3HR64YZRC7VZmhVZkewiLcPDpa"), endpoint: Listener { local_addr: "/ip4/127.0.0.1/udp/9992/quic", send_back_addr: "/ip4/127.0.0.1/udp/54502/quic" }, num_established: 0, cause: Some(IO(Custom { kind: Other, error: ApplicationClosed(ApplicationClose { error_code: 0, reason: b"" }) })) } # Rust -> Golang Local peer id: PeerId("12D3KooWLy9RkY4uW26ryp3seZ6QXuiJbmBywVE359MnQopnYEj2") Interval Transfer Bandwidth 0 s - 10.00 s 3062 MBytes 2449.60 MBit/s ConnectionClosed { peer_id: PeerId("12D3KooWL3XJ9EMCyZvmmGXL2LMiVBtrVa2BuESsJiXkSj7333Jw"), endpoint: Dialer { address: "/ip4/127.0.0.1/udp/9993/quic", role_override: Dialer }, num_established: 0, cause: None } # Golang -> Rust 2022/08/05 15:10:43 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details. IncomingConnection { local_addr: "/ip4/127.0.0.1/udp/9992/quic", send_back_addr: "/ip4/127.0.0.1/udp/54359/quic" } ConnectionEstablished { peer_id: PeerId("12D3KooWKGLyqMFxSXgqv8n1mwX3Ljxg18ENFTtuiHTRYE8DHxrv"), endpoint: Listener { local_addr: "/ip4/127.0.0.1/udp/9992/quic", send_back_addr: "/ip4/127.0.0.1/udp/54359/quic" }, num_established: 1, concurrent_dial_errors: None } Interval Transfer Bandwidth 0s - 10.00 s 2160 MBytes 1727.97 MBit/s Behaviour(PerfRunDone(9.999945043s, 2160640000)) ConnectionClosed { peer_id: PeerId("12D3KooWKGLyqMFxSXgqv8n1mwX3Ljxg18ENFTtuiHTRYE8DHxrv"), endpoint: Listener { local_addr: "/ip4/127.0.0.1/udp/9992/quic", send_back_addr: "/ip4/127.0.0.1/udp/54359/quic" }, num_established: 0, cause: Some(IO(Custom { kind: Other, error: ApplicationClosed(ApplicationClose { error_code: 0, reason: b"" }) })) } # Golang -> Golang 2022/08/05 15:10:53 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details. Interval Transfer Bandwidth 0s - 10.00 s 2085 MBytes 1667.91 MBit/s ```
localhost: TCP with libp2p-perf ``` Compiling libp2p-core v0.32.1 Compiling libp2p-noise v0.35.0 Compiling libp2p-plaintext v0.32.0 Compiling netlink-proto v0.10.0 Compiling rtnetlink v0.10.1 Compiling if-watch v1.1.1 Compiling ed25519-dalek v1.0.1 Compiling toml v0.5.9 Compiling proc-macro-crate v1.1.3 Compiling multihash-derive v0.8.0 Compiling multihash v0.16.2 Compiling multiaddr v0.14.0 Compiling libp2p-swarm v0.35.0 Compiling libp2p-dns v0.32.1 Compiling libp2p-tcp v0.32.0 Compiling libp2p-yamux v0.36.0 Compiling libp2p v0.44.0 Compiling libp2p-perf v0.1.0 (/home/kpp/parity/libp2p-perf/rust) Finished release [optimized + debuginfo] target(s) in 37.42s kpp@xps:~/parity/libp2p-perf$ ./run.sh # Start Rust and Golang servers. # Rust -> Rust ## Transport security noise Interval Transfer Bandwidth 0 s - 10.00 s 8201 MBytes 6560.56 MBit/s ## Transport security plaintext Interval Transfer Bandwidth 0 s - 10.00 s 14598 MBytes 11678.23 MBit/s # Rust -> Golang ## Transport security noise Interval Transfer Bandwidth 0 s - 10.00 s 6832 MBytes 5465.37 MBit/s ## Transport security plaintext Interval Transfer Bandwidth 0 s - 10.00 s 12462 MBytes 9969.54 MBit/s # Golang -> Rust ## Transport security noise Interval Transfer Bandwidth 0s - 10.00 s 9636 MBytes 7707.60 MBit/s ## Transport security plaintext Interval Transfer Bandwidth 0s - 10.00 s 20705 MBytes 16563.89 MBit/s # Golang -> Golang ## Transport security noise Interval Transfer Bandwidth 0s - 10.00 s 5741 MBytes 4592.47 MBit/s ## Transport security plaintext Interval Transfer Bandwidth 0s - 10.00 s 21973 MBytes 17578.36 MBit/s ```
a realworld bench on 2 VMs ``` megamachines: TCP plaintext: Interval Transfer Bandwidth 0 s - 10.00 s 8397 MBytes 6716.62 MBit/s Interval Transfer Bandwidth 0 s - 10.00 s 8418 MBytes 6734.40 MBit/s Interval Transfer Bandwidth 0 s - 10.00 s 8414 MBytes 6731.19 MBit/s TCP noise: Interval Transfer Bandwidth 0 s - 10.00 s 6398 MBytes 5118.11 MBit/s Interval Transfer Bandwidth 0 s - 10.00 s 6136 MBytes 4908.64 MBit/s Interval Transfer Bandwidth 0 s - 10.00 s 6365 MBytes 5091.84 MBit/s QUIC: Interval Transfer Bandwidth 0 s - 10.02 s 709 MBytes 566.07 MBit/s Interval Transfer Bandwidth 0 s - 10.01 s 728 MBytes 581.88 MBit/s Interval Transfer Bandwidth 0 s - 10.01 s 701 MBytes 560.16 MBit/s ```
elenaf9 commented 1 year ago

@kpp are those benchmarks for the now merged implementation (libp2p/rust-libp2p#2289) or for libp2p/rust-libp2p#2801?

burdges commented 1 year ago

Interesting, so quite a dramatic loss of performance for now.

Ralith commented 1 year ago

Much faster than the go stack on localhost, though.

How much latency is there between the two VMs, and what was the instantaneous throughput like at the end of the test? The discrepancy between localhost and "real world" is interesting, and hints at significant room for improvement in our congestion controller, which is known to have a few significant TODOs still (e.g. HyStart for Cubic). Which congestion controller did you use? Did you try the experimental BBR impl?

burdges commented 1 year ago

I've heard claims the congestion controller winds up being some application specific black magic that Google never explains, but maybe you guys understand better the underlying concerns there?

Ralith commented 1 year ago

I'm not sure what you mean. Quinn implements three different congestion controllers (New Reno, Cubic, and BBRv1), all of which are fairly well documented in the literature. The underlying logic is very similar to that used in the Linux kernel TCP implementation, but there's a few optimizations to Cubic that we haven't implemented yet.

kpp commented 1 year ago

@elenaf9 these are for https://github.com/libp2p/rust-libp2p/pull/2801. I will re-do it for the master branch too.

kpp commented 1 year ago

https://github.com/libp2p/rust-libp2p/pull/3454 is merged

kpp commented 1 year ago

QUIC is added to rust-libp2p: https://github.com/libp2p/rust-libp2p/issues/2883#issuecomment-1705623809

burdges commented 1 year ago

I'd assume the benchmarks remain pretty lackluster?

mxinden commented 1 year ago

I'd assume the benchmarks remain pretty lackluster?

@burdges the latest libp2p-quic offers a ~10x throughput improvement compared to libp2p-tcp on a single connection single stream benchmark.

Validated via https://github.com/libp2p/test-plans/tree/master/perf

Expect more metrics to come. But at this point, I don't see a reason why to be pessimistic about libp2p-quic's performance characteristics.

Ralith commented 1 year ago

I believe they're thinking of the numbers at https://github.com/paritytech/polkadot-sdk/issues/536#issuecomment-1691857046, which suggest overly pessimistic congestion controller behavior when there's significant latency, which tends to not be reflected by loopback tests. The details I asked for in https://github.com/paritytech/polkadot-sdk/issues/536#issuecomment-1691857065 could help drive further investigation.

Of course, those older numbers were also worse than TCP, so maybe they're just obsolete.