quicwg / base-drafts

Internet-Drafts that make up the base QUIC specification
https://quicwg.org
1.63k stars 204 forks source link

Reduce restrictions on valid RTT samples #2568

Closed janaiyengar closed 5 years ago

janaiyengar commented 5 years ago

The RTT estimator currently does not use RTT samples if the largest_acked is not ack-eliciting. This can lead to the endpoint missing RTT samples unnecessarily, in two ways.

First, checking only for the largest_acked to be ack-eliciting means that we ignore other newly acked packets that were ack-eliciting. The peer is expected to send an ack within max_ack_delay if it has received any ack-eliciting packets, not just the largest acked one. This is probematic if the connection has a pattern that results in more non-ack-eliciting packets being the largest_acked. I propose that we change the condition for a valid RTT sample to be if any newly acked is ack-eliciting (in addition to the ACK frame carrying a new largest_acked).

Second, non-ack-eliciting packets are a perfectly fine source of RTT samples, with the caveat that ack_delay might be larger than max_ack_delay; a receiver that has only received non-ack-eliciting packets might delay acknowledging these packets much longer than otherwise. I propose that we don't drop all RTT samples associated with non-ack-eliciting packets, but only those with unusable ack_delay. Basically, I propose that we use the RTT sample if ack_delay < max_ack_delay.

nibanks commented 5 years ago

Why only accept the RTT measurement if ack_delay is less than (or equal I assume) max_ack_delay? So the peer held on to the packet a little longer than normal? What's the big deal? If they report to you exactly how long they held on to it, you should still get an accurate RTT measurement, right?

ianswett commented 5 years ago

@janaiyengar you make some good points, but the sender needs to know which packet to use to create an RTT sample. The obvious answer is the largest ack-eliciting packet number. I can think of some cases involving ACK loss and/or reordering that would be slightly incorrect, but I can't think of any cases that would be disastrous. However, it seems important that ack_delay is from the largest_acked ack-eliciting packet in that case, otherwise the RTT sample will be incorrect. I'm not feeling great about the fact we're not explicitly specifying the largest ack eliciting packet an instead letting the sender figure it out, but besides that this seems ok. I will note that we could also advise not including non-ack-eliciting packets as the largest acked and only ACK them when they're less than an ack-eliciting packet. I wonder if that might be simpler, given this is somewhat of an edge case?

@nibanks The reason for limiting ack_delay to max_ack_delay is to incentivize correct reporting of max_ack_delay. Otherwise, one could report a max_ack_delay of 0ms and consistently report a larger ack_delay value, allowing the receiver to decrease the sender's SRTT. I'm not sure how large a problem this is in practice, but it's nice if the mechanisms incentivize accurate reporting.

The algorithms should devolve into something very similar to TCP if max_ack_delay is 0ms, so the worst case doesn't seem that bad.

nibanks commented 5 years ago

It just seems like a bad idea to always ignore RTT measurements if ack_delay is larger than max_ack_delay. For instance, what about one of the following scenarios:

  1. Sheer CPU load on the peer prevents it from ACK'ing a packet within max_ack_delay.
  2. An implementation that literally sets a max_ack_delay timer on receipt of a packet, then when the timer fires, queues a send packet work item. When that work item is processed, the ack_delay is calculated and it ends up being just over max_ack_delay.
ianswett commented 5 years ago

@nibanks In both cases, I believe the current algorithm of limiting ack_delay to max_ack_delay makes perfect sense. In the case of CPU load, RTT should show up in SRTT, RTTVar, or both.

In the second case, the text also advises including any expected timer delay in your advertised max_ack_delay.

This issue is about what happens if the largest acked is an ACK-only packet(not ack-eliciting). In this case, we do ignore the RTT sample, and I do think that could turn out to be problematic, particularly in very asymmetric flows.

nibanks commented 5 years ago

@ianswett my comments were in reference to @janaiyengar's last sentence:

Basically, I propose that we use the RTT sample if ack_delay < max_ack_delay.

I am arguing that we shouldn't throw away an RTT sample all together if ack_delay is greater than max_ack_delay.

ianswett commented 5 years ago

Even if it is not ack-eliciting? Doing that could subject calculations to huge and unpredictable RTT spikes.

nibanks commented 5 years ago

No. I thought Jana's comment was even for ack-eliciting packets. It wasn't clear.

janaiyengar commented 5 years ago

Apologies for any confusion. @nibanks I was referring to non-ack-eliciting packets in that second half of my description, where I made that proposal.

janaiyengar commented 5 years ago

I've been doing more thinking, and here's where I've landed. (Thanks @kazuho and @ianswett for conversations.)

The higher order principle is this: max_ack_delay suggests a contract between this endpoint and its peer. The peer is supposed to send ACKs for ack-eliciting frames within max_ack_delay amount of time. Effectively, the peer wants to wait for no more than max_ack_delay amount of time, but if the actual received ack_delay is larger (due to ack loss or scheduler latency), the RTT should be increased to compensate for this additional delay beyond max_ack_delay, because it's not within the peer's control and we might consider it part of "path" delay.

With this principle in place, I think it still makes sense to make the first change I proposed: that an ACK should be used for RTT measurement if any packet it newly acks is ack-eliciting. The RTT measurement is still on the largest_acked (since ack-delay is reported for the largest_acked).

It is difficult to reason about ACKs that only ack non-ack-eliciting packets, since the max_ack_delay contract does not hold. It is specifically difficult to reason about the reported ack_delays, since the peer can wait arbitrarily long and report arbitrarily long ack_delays. It's incorrect to add any additional delays (beyond max_ack_delay) into the RTT since the increased delay was not due to uncontrollable circumstances. Simply using the ack_delay, however large it may be, allows for the peer to simply game the RTT. Using only the samples that have ack_delay < max_ack_delay might introduce a bias. It gets complicated quickly.

So, I conclude that we should not use ACKs of non-ack-eliciting packets for RTT measurement. This is what we do right now, so that's good. I'll add some text explaining why.

We should also recommend that an endpoint that is only sending non-ack-eliciting packets should piggyback a PING frame along with an ACK roughly once an RTT so that its RTT estimate is reasonably current.

mikkelfj commented 5 years ago

We should also recommend that an endpoint that is only sending non-ack-eliciting packets should piggyback a PING frame along with an ACK roughly once an RTT so that its RTT estimate is reasonably current.

The sounds like a complication - and extra timer check and keeping track of whether ACK eliciting packets were sent. Also, it could lead to a lot of unnecessary PONG traffic. Is it really that important to update RTT that often, and if so, wouldn't it be better to revisit the idea of measuring RTT for non-ack eliciting packets dispite the potential downfalls?

ianswett commented 5 years ago

+1 to @janaiyengar 's last comment. I look forward to a PR.

In regards to the last paragraph, we already have some text in transport about this, and I'm not sure how specific I want to be, given it's only really important as a mechanism for the sender to drop state if they're waiting for acks of acks.

"Because ACK frames are not sent in response to ACK-only packets, a receiver that is only sending ACK frames will only receive acknowledgements for its packets if the sender includes them in packets with non-ACK frames. "

kazuho commented 5 years ago

My +1 as well.

I think it's reasonable to recommend endpoints to send PINGs, if it wants to calculate RTT when it is not sending data.

First of all, this is not a departure from the design we have in TCP; in TCP, an endpoint that is not sending data has no way of inferring RTT from ACKs.

Second, requiring the peer to send ACKs for ACK-only packets promptly makes the specification complex. Because such a mechanism cannot be solely based on a timer (because ACK-of-ACK cannot be sent without piggybacking), even though it needs to be based on a timer (otherwise the endpoint cannot receive an ACK in a timely manner).

Considering these two aspects, I think using PINGs is simpler. Also, endpoints can choose different strategies on how frequent they would send PING; some might send a PING every one second (or 1 RTT), or a download-only application might not send PINGs at all.

marten-seemann commented 5 years ago

With this principle in place, I think it still makes sense to make the first change I proposed: that an ACK should be used for RTT measurement if any packet it newly acks is ack-eliciting. The RTT measurement is still on the largest_acked (since ack-delay is reported for the largest_acked).

The problem with this change is that it requires a sender to keep state for non-ack-eliciting. Currently, in my implementation, I only save a sent packet (including its metadata like the time sent and the packet size) if that packet is ack-eliciting.

ianswett commented 5 years ago

@marten-seemann That seems like a sensible point. Possibly this should be a SHOULD, not a MUST when the largest acked is not ack-eliciting?

marten-seemann commented 5 years ago

We could also require (or at least recommend?) the peer to not send acknowledgements for non-ack-eliciting packets.

nibanks commented 5 years ago

I disagree that we shouldn't be acknowledging non-ack-eliciting packets as the largest_ack. We should always be sending the most up to date information available? @marten-seemann why don't you track the non-ack-eliciting packets? They shouldn't be very high in number or really structurally different from other packets? It seems like an unnecessary optimization that would just cause you more problems in this scenario.

ianswett commented 5 years ago

I agree with Nick that we should keep sending the most up-to-date information.

mikkelfj commented 5 years ago

It's been a while since I worked actively on a QUIC impl., but in my implementation, buffers would be retained until all pending packets in a block have been acked or lost, then that block is flushed. Having an unnecessary complication to this system would be painful. If it is only the leading block, i.e. one of a few recently sent non ack eliciting packets, it won't impact the overall design as much.

janaiyengar commented 5 years ago

@marten-seemann: I agree with @nibanks that tracking non-ack-eliciting packets shouldn't be much additional load, but I don't think that's necessary.

If an ACK acknowledges packets that you don't know about, they've been acked or they're non-ack-eliciting. Either ways, this ACK should not generate a valid RTT sample.

I'll also note that you are not required to use every ACK that acks any ack-eliciting packets for an RTT sample. If you're not tracking a packet, you can't use it's ack for an RTT measurement. Ignoring these ACKs is the only way forward, and the text will allow that.

marten-seemann commented 5 years ago

It might not be a lot of additional load, but a lot of additional complexity. For example, what happens if the peer never acks an ack-eliciting packet (either because it was actually lost, or as part of an attack)? You then have to somehow clear that packet from your history. If you use the normal loss detection mechanism for it, you have to be careful not to take any congestion controller action upon loss of that packet. You also don't want to set any timers based on non-ack-eliciting packets, etc.

I never suggested sending out-of-date information, that would indeed be a bad idea. There's just no need to send an ACK at all if largest_acked is not ack-eliciting, and I'm suggesting that we recommend (mandate?) that peers don't do it. They should just wait until they receive the next ack-eliciting packet.

janaiyengar commented 5 years ago

@marten-seemann : Agreed, which is why I said that it's not necessary to track non-ack-eliciting packets. We already say in the transport draft that an endpoint not send an ACK if all it's received is ACKs. Is that what you meant?

The problem here is that a peer might decide to piggyback an ACK frame with some other stuff it is about to send (MAX_STREAM_DATA perhaps). That could result in an ACK frame that acks only non-ack-eliciting packets. It seems unnecessary to put the burden on a peer to not send this ack. It's just as easy to ignore such acks when received.

janaiyengar commented 5 years ago

@marten-seemann : FWIW, I did suggest a way forward -- that an endpoint ignore ACK frames (for RTT sampling) when the ACK frame doesn't newly acknowledge ack-eliciting packets that the endpoint is tracking. Does this work for you?

marten-seemann commented 5 years ago

I just realized that there's a pathological traffic pattern which would prevent a peer from ever getting a valid RTT sample (e.g. if all ACKs are sent on the max_ack_delay timer, and if after every ack-eliciting packet a non-ack-eliciting packet is received). It might be safe to ignore this edge case and assume that it won't happen a lot in practice.

nibanks commented 5 years ago

I don't want to have a design that has a certain scenario where things break down, but just ignores it. Also, I don't want to have a design where the receiver has to decide if I can send an ACK with my current state or not. Generally, I expect receivers to only keep track of the time they received the largest packet number; and just use that when they frame an ACK.

@marten-seemann as to your issues with keeping track of ACK-only packets I'd argue they are almost entirely the same as normal packets. The ONLY difference is congestion control. As far as the peer trying to attack you by explicitly not ACK'ing the packet, that's no different for any packet. The same logic applies to all packets you send. Eventually, you age out lost packets.

IMO, the simplest and most straight forward design is to always send ACK frames for your complete, current state, independent of packet contents. If the receiver of the ACK frame decides the RTT measurement shouldn't be used, that seems to be trivial enough logic to add in that specific ACK processing code. But I still argue that so long as the ACK wasn't delayed too long (~max_ack_delay) then the RTT measurement should be used.

You don't want the worst case traffic pattern of each flight/burst of packets ends with an ACK-only packet to cause you never to accept RTT measurements from the correspond ACK you get back.

janaiyengar commented 5 years ago

@marten-seemann : This pathological case is exactly why I'm proposing a change to "any acked packet was ack-eliciting".

@nibanks: I wasn't proposing new text. To be clear, if you do not have any state for packets that are being acked, you have no choice but to ignore the ack. This is not a new rule, it just falls out of standard ack processing as we have it now.

mikkelfj commented 5 years ago

The problem here is that a peer might decide to piggyback an ACK frame with some other stuff it is about to send (MAX_STREAM_DATA perhaps). That could result in an ACK frame that acks only non-ack-eliciting packets. It seems unnecessary to put the burden on a peer to not send this ack. It's just as easy to ignore such acks when received.

There is the interesting derived effect that if you do not ACK non-ack-eliciting packets, this could be used as optimistic ack attack mitigation. This means that you can avoid gaps in the packet number space. You could periodically coalesce non-ack-eliciting packets with other packets as necessary.

The problem is of course that the receiver needs to track those gaps, but I'm not sure that is much different than what it already needs to track.

ianswett commented 5 years ago

I believe this was closed by #2592

@janaiyengar I'll let you confirm that and close this.

janaiyengar commented 5 years ago

Closed by #2592