paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.com/
1.92k stars 709 forks source link

Improve documentation for optimized network #908

Open crystalin opened 3 years ago

crystalin commented 3 years ago

I believe there are optimizations that can be done at the node level to improve networking. Unfortunately the one I'm thinking of are controlled by the kernel so they would have to be done manually by the operator, but I'm happy to discuss better solutions here.

The first one that I'm suggesting is initcwnd (Initial Congestion Window) which control the initial value of the congestion window (part of the TCP slow start). While I think the default value on linux kernel is good for normal users, increasing it has a great impact on networking for servers with good/dedicated bandwidth and frequent connections.

As timing is critical in the relay/parachain networking, increasing initcwnd to values like 46 seems to be interesting while still being considered safe.

In combination with increasing the initial congestion window, we could also prevent a reinitialization of the value when idle by disabling /proc/sys/net/ipv4/tcp_slow_start_after_idle

bkchr commented 3 years ago

AFAIR we tested this quite some time ago to improve the performance, but it wasn't really measurable.

CC @tomaka @eskimor

tomaka commented 3 years ago

As far as I remember we did experiment with this with our Rococo validators, and the speed for collations was indeed much better.

Unfortunately it seems complicated to ask our collators and validators community to tweak their system. Many have trouble just opening a port. For a long time, the hope was that replacing TCP with QUIC would render this irrelevant anyway, but at the moment the performances of our QUIC prototype have unfortunately been very disappointing.

burdges commented 3 years ago

at the moment the performances of our QUIC prototype have unfortunately been very disappointing.

Interesting.. Any idea what happens there?

crystalin commented 3 years ago

UDP based messaging could also provide some good result, but we would have to build good control on top of it as we are in a heavy p2p networking environment.

Also it would offer probably less protections (no reserved connection for exemple)

tomaka commented 3 years ago

Interesting.. Any idea what happens there?

We're investigating (cc @kpp). No idea yet. The problem is that we don't know the quality of the underlying library we're using (quinn-proto) so we don't know which numbers are normal or not.

burdges commented 1 year ago

Mike Perry said: Tor hit similar QUIC issues. A lower level networking stack demands considerable expertise and developer time. Google & now Apple spent those resources for their QUICs.

We might contribute benchmark infrastructure to quinn I guess.