scylladb / seastar

High performance server-side application framework
http://seastar.io
Apache License 2.0
8.3k stars 1.54k forks source link

allow setting buffer sizes on server_socket #2458

Open travisdowns opened 2 days ago

travisdowns commented 2 days ago

We add two options to set the recv and send (SO_RCVBUF, ...) buffer sizes on a listening socket (server_socket). This is mostly useful to propagate said sizes to all sockets returned by accept().

It is already possible to set the socket option directly on the connected socket after it returned by accept() but experimentally this results in a socket with the specified buffer size, but whose receive window will not be advertised to the client beyond the default (64K for current typical kernel defaults). So you get only some of the benefit of the larger buffer.

Setting the buffer size on the listening socket however, is mentioned as the correct approach in tcp(7) and does not suffer from the same limitation.

A test is included which checks that the mechanism, including the inheritance, works.

travisdowns commented 2 days ago

This was discovered due to very poor throughput between a remote client with ~250 rtt and Redpanda: this transfer is receive window limited and benefits from buffers > 1 MB, but using such buffers (we set the configured buffer size on the connected_socket immediately after connection) had no effect despite the change taking effect per ss. The problem was that setting it "too late" prevents the receiving side from advertising the larger size in its receive window. Arguably a kernel flaw? The window scale was set high enough (128x) even when setting it in this way, so that wasn't preventing the window from scaling though this is often given as the reason why "late setting" the recv buffer size does not work.