Open ghost opened 3 years ago
This is a fascinating corner of TCP and I appreciate your bringing it to our attention! I spent some time reading through the resources you linked to.
As I understand it, TCP_QUICKACK
controls acknowledgement delays, while TCP_NODELAY
controls sending delays. Trio can stop using TCP_NODELAY
if its peer sets TCP_QUICKACK
, but Trio has no way to control that, and indeed most potential peers probably don't set TCP_QUICKACK
. If Trio doesn't set TCP_NODELAY
, and Trio's peer doesn't set TCP_QUICKACK
, then we're opening ourselves up to the situation described in https://jvns.ca/blog/2015/11/21/why-you-should-understand-a-little-about-tcp/ with the long delay. So I don't think it would reduce Trio user frustration on average if Trio were to stop setting TCP_NODELAY
, regardless of what we do about TCP_QUICKACK
.
Which leaves the question: should we be setting QUICKACK
in addition to NODELAY
? I guess it could improve things in the opposite direction, for peers that don't set NODELAY
. Having to reenable it after every receive seems super heavyweight, though. Are there any specific cases that we expect will come up at least semi-frequently where QUICKACK
would make a big difference?
Are there any specific cases that we expect will come up at least semi-frequently where QUICKACK would make a big difference?
I can't think of situations where both NODELAY and QUICKACK would be specifically useful together; but I can think of situations where turning on QUICKACK alone unconditionally would be. For example, websockets would benefit from disabling NODELAY due to the relatively small size of control frames (such as ping/pong) and probably relatively small data frames are more common. However this loses a lot of performance without QUICKACK if any sort of network synchronisation is required. I also can't think of any downside to setting QUICKACK if NODELAY is enabled.
I believe that the cost of a setsockopt call is probably lower (in the nanoseconds to microseconds, on modern computers) vs the cost of delayed acks (in the hundreds of the miliseconds in the worst case) that i would be an acceptable trade-off.
This is a fascinating corner of TCP and I appreciate your bringing it to our attention! I spent some time reading through the resources you linked to.
Definitely! I'd never even heard of QUICKACK before.
I'm definitely not an expert in all the low-level details of TCP, so please check me on this. But I actually think NODELAY is generally what you want, even if QUICKACK is also enabled? My reasoning:
Nagle's algorithm says that if you have a partial packet ready to send, then don't actually send it until the previous packet is ACKed.
So, say our packet size is N. And imagine a peer that sends a message that's 1.5 * N bytes big, then waits for the other side to process that message and send a response. (For example, a "message" here might be a websocket frame, or a TLS frame, or an HTTP request.)
If Nagle's algorithm is enabled, then we send the first packet, wait for it to be ACKed, and then send the second half-full packet. Then once the other side has received the full message, it can process it and send back its response. The whole process takes 2 round trip times (one for the ACK + one for the response), plus however long the other side waits before sending the ACK.
The problem that Nagle talks about in the ycombinator post is that if you have delayed ACKs enabled, then that last component becomes large, which is obviously bad for our overall latency. Setting QUICKACK makes the ACK sending delay zero, so our whole process only takes 2 round trip times.
However, if we set NODELAY, then both packets are sent immediately, and the other side gets the whole message at once, and can reply immediately. So now it only takes 1 round trip time.
On a low-latency connection like localhost or in a data center, the ACK delay is much larger than the round trip time, so it doesn't really matter whether you do 1 or 2 round trips, but it really matters that you don't wait for the ACK.
OTOH, on a high-latency connection that goes over the internet, round-trip times can easily dominate application performance. So going from 2 round trips to 1 round trip is a huge deal, and you definitely want NODELAY, regardless of whether QUICKACK is set or not.
OTOH, if NODELAY is set, then I think QUICKACK doesn't matter too much? In this specific situation, I think it actually makes things slightly more efficient. With QUICKACK enabled, the other side has to ACK the first request packet, then ACK the second request packet, then send more packets with the response. With delayed ACK enabled, then the first two ACKs get delayed, and then when we send the response the ACKs can piggy back on that for free. Probably not a huge deal either way in practice, but the point is QUICKACK isn't really helping any.
So NODELAY seems to be helpful in at least some situations. But everything has tradeoffs, so what are the downsides? The main one is that it forces application code to buffer up complete messages in userspace and pass them to the kernel in a single chunk, instead of calling write() a bunch of times to compose a single message. Nagle is more forgiving of sloppy application code. But OTOH:
So that's why Trio's main TCP interfaces unconditionally enable NODELAY. (Though we do still give the option of dropping down to the raw socket layer and then you can setsockopt
whatever you want.)
I suspect this is also why most of the networking libraries I've looked at enable NODELAY by default, and none of them enable QUICKACK, and why OSes haven't put the effort into making QUICKACK widely available or easy to use? But that's just a guess.
TCP_QUICKACK
disables delayed acknowledgements (one of the more problematic parts of the implementation of Nagle's algorithm) but keeps small packet buffering, whereasTCP_NODELAY
disables both (reducing throughput on small writes).As put by Nagle himself:
Unfortunately, when you search online you may find that
TCP_QUICKACK
is Linux-only. This is not true! Doing asetsockopt
with option 12 (the same define as on Linux) under Windows works (although if it does something I don't know), at least on Windows 10. Additionally, it's seemingly not needed on OS X at all ("on my OSX MacBook Air however the RPC call needed only 3ms!").However a second issue arises:
TCP_QUICKACK
can turn itself off. The solution to this is seemingly turning it back on after every recv call.See also: https://github.com/urllib3/urllib3/issues/746, and this RFC.