microsoft / net-offloads

Specs for new networking hardware offloads.
MIT License
27 stars 3 forks source link

How do we limit the pacing queue? #53

Open maolson-msft opened 1 year ago

maolson-msft commented 1 year ago

Half of all networking problems are caused by unbound queues.

Pacing introduces a new queue into the system. How will the size of this queue be limited?

nibanks commented 1 year ago

In other words, what is the back pressure mechanism involved here? I've talked with @csujedihy on this topic and he had quite a few thoughts. A few of the available options we discussed:

Explicit Queue Size + Tracking

The NIC can indicate the queue size available for outstanding TPTO sends and the OS keeps track of how many currently queued items it has given to the NIC. If we reach a certain "high water mark" and risk running out of space then we indicate this back to the sender with a special "success but running out of space" status code that can then be used as a signal for congestion control to back off.

Explicit High-Water Mark Signal

Similar to above, but instead of the NIC indicating a queue size (which might not be possible or at least easy) the NIC would be the one to track the queue usage and indicate the status in the send completion for "high water mark".

ECN

The NIC could use ECN to mark the packets that are approaching the queue limit when it sends them out. Then, if the protocol supports ECN, it should respond appropriately.

This has the downside of requiring protocol level support as well as waiting a round trip before getting the signal. By then the queue might have already overflowed.

Packet Loss

In the absence of any explicit signal, the fallback is packet loss; essentially treating this as any other queue on the network.

Again, this requires protocol support for loss detection as well as a round trip to discuss the problem and will always result in data loss that must be recovered (retransmitted).

csujedihy commented 1 year ago

After some further thinking on this and discussion with Matt yesterday, I think back-pressure mechanism is necessary only when we have a fixed pacing rate configured (e.g. HTB rate limiter and SO_MAX_PACING_RATE in linux). When I experiment pacing with a LWF pacer, most of the time I just set a fixed rate, which causes cwnd to grow unboundedly. If the pacing logic we are going to implement does something like setting pacing rate to cwnd/rtt, we likely won't have a large pacing queue. Though, I sort of expect a socket option for setting a fixed/max pacing rate will be the first feature request from our future customers after we release TCP/UDP/QUIC pacing.

That said, an unbounded queue is still a problem that needs to be addressed.