Closed martinthomson closed 6 years ago
My primary goal with first suggesting the general design we have today with Retry was to support an independent DoS mitigation device to be put in path in front of the server. This is a requirement we get from Azure. If any changes to the Retry design require some state to be shared between the mitigation device (we own) and the backend QUIC server (we don't own), that will severely hamper the Azure efforts. It would require some deployment story and would make things a lot more complex for users and would likely go unused then.
You state above that unauthenticated Retry packets are a source of some concern, show that the initial CID can be modified and then go on to say it isn't a big deal (which I agree with). I'm still not really sure what folks are worried about this this supposed attack vector. Initial CID and Initial packet encryption are essentially just used for version ossification prevention. If the attacker understands the version, it's essentially cleartext.
So we don't we add a nonce in the initial packet and pass it along through relevant stages of the handshake and validate it before commiting to a handshake? The nonce replaces the initial connection ID which doesn't really make any sense anyway.
My primary goal with first suggesting the general design we have today with Retry was to support an independent DoS mitigation device to be put in path in front of the server.
An "independent DoS mitigation device" is fundamentally indistinguishable from a middlebox. In all other parts of the protocol we're very careful to design things such that middleboxes can't interfere with the protocol, while at the same time inviting them to do exactly this when it comes to Retries.
I'm still not convinced that this is a reasonable deployment for QUIC. QUIC Initials and TCP SYNs are not really comparable, since we have the 1200 byte requirement for QUIC packets, which will make SYN flood-like attacks orders of magnitude more expensive.
"[A]ll other parts of the protocol" is after the handshake. We've acknowledged from the beginning that a middlebox can meddle in the handshake; in a few cases, we've taken steps to detect such meddling and abort the connection, but that's not the norm.
I certainly wouldn't object to adding a mechanism that allows the server to detect such a middlebox and kill the connection if it's unexpected. While I agree that being able to do this without close coordination (shared keys, for example) is valuable, I don't think it's an unreasonable level of coordination to say "I expect such a box" / "I don't expect such a box".
Even a transport parameter that indicates how many Retry packets have been seen prior to the one it's responding to would suffice. However, that does require the client to (partially or fully) regenerate the ClientHello.
If a server varies the opaque Retry token for each attempt at a retry, it can already detect which ones of the tokens caused the response and count the number generated. That also allows it know if there is a gap, without a new transport parameter. If that's the expected behavior, a client seeing a replayed Retry token would close the connection and a server seeing a token it did not generate would close the connection.
Presuming that the token is passed with integrity protection at some point, is that enough?
Here I was thinking of enabling what Mike suggested. Perhaps the client can copy the Retry (or Retries, or a hash thereof) into its transport parameters and the server can either check at its discretion.
@hardie, there are two pieces that are relevant here: the connection ID and the token. Both would need some form of checking/protection that is covered by something more strongly authenticated than the Initial keys. Using Handshake keys would probably work if we needed that, but it's probably easier all around to put something into the cryptographic handshake transcript. One thing I'm concerned about is the erasure scenario that @kazuho described where a MitM can force a Retry without the server ever being aware of it having happened.
I'm somewhat surprised by the anti-middlebox attitudes here. Don't you realize that practically every single device in the Internet is behind a middlebox?
For QUIC to work, a middlebox needs to allocate state for the UDP connection. Thus, every single UDP packet potentially creates state, requiring memory. If there are lots of UDP packets (think about the equivalent of TCP SYN flood in QUIC), the memory of the middlebox will be quickly exhausted.
The only form of saving the middlebox from an out-of-memory condition is if the middlebox can authenticate the client's willingness to fully open the connection before allocating any memory. Thus, the middlebox needs to send a RETRY packet to the client. This middlebox sending a RETRY packet is exactly what you're planning to prevent here by authenticating retry.
It's not only the endpoint that needs to be protected from flooding attacks. The middlebox needs to be protected also.
If a middlebox sends a RETRY for every new connection, that will kill 0-RTT, no? Because it means that we're back to adding an extra roundtrip all the time.
Not really, this is probably a rare scenarios, e.g. when middlebox is already in some DDoS protection state or if the box would usually only expect out-going connections.
Yes, a RETRY sent by a middlebox will add extra roundtrip. However, as @mirjak noticed, the DDoS protection need not be always on. It is possible to enable the RETRY authentication when some percentage (such as 50%) of the memory available for UDP state machines has already been taken by a DDoS attack. I consider it far better to have an extra roundtrip when under an attack than it is to stop accepting new connections due to the out-of-memory condition.
The same thing is true for SYN cookies: they are the fallback, the usual rule is that the server will have the connection parameters stored in memory. SYN cookie will take over the memory only when running out of memory.
I worry that such a feature would effectively always become turned on, either by misconfiguration ("it sounds more secure"), or because the capacity of the path is increased without the box being upgraded, or other such reasons. Sure, in a perfect world, such boxes would be carefully monitored and carefully operated, but that's not really the world we live in. So a highly visible failure (e.g., not accepting new connections) is IMO much preferable to a silent degrading of user experience.
I agree with @larseggert that middlebox owned by third parties should not be allowed to generate reties. Allowing such behavior will ossify the QUIC protocol, because that would mean that the protocol cannot be upgraded just by modifying the endpoints.
We should forbid such behavior of middleboxes. I would argue for mandating the authentication of retries, if there is a chance of seeing such middleboxes appearing (if we ship QUIC with unauthenticated retries).
But it could be done such that some trusted middleboxes could be authorized to do it via key sharing, possibly tied to protocol version.
@mikkelfj trusted by what? Trusted by the client? Trusted by the server? There are middleboxes in the client side as well, and if such a client side middlebox happens to have even one open port, a DDoS attack could fill the state tables, thus preventing outgoing connections from succeeding if incoming and outgoing connections share the same memory (like they usually do).
I don't agree with @larseggert that a DDoS should cause a visible failure. Usually, the goal is to NOT have a DDoS attack perform a visible failure but rather try as hard as possible to cause silent degrading of user experience. Of course, the ultimate goal would be to have no degradation of the user experience at all, but DDoS attacks are hard to mitigate.
@jmtilli I understand your concern, but an authorized middleware could only be in collaboration with the endpoint that transmits the retry, i.e. the server role. It's hardly practical to have that happen in domestic routers, and such routers would have a hard time doing anything more than dropping new connections - to where should it redirect traffic?
If you have unauthorized retries, a set of hacked domestic routers could force legitimate connections to gravitate towards a single target server - but that can to an extend be guarded by building in protection in tokens.
The connection ID is visible in outbound packets. The initial CID is random, the next is server chosen (but can be absent and replaced by 5-tuple). A client side router need not track the initial CID and can track connection setup over a short timespan. If the CID survives that, it is likely a valid connection, and if not, the situation would not be improved with a retry.
A client side router would never need to hold more than a limited number of connections unless it is proxying for a large org and the state space is much smaller than that of an endpoint proper. Hence, a large router could afford the RAM to hold that many connections.
So the primary problem is server side that receive many connections from many sources. For peer to peer server networks, client middleware is also a concern, but still I don't see why a client side router would issue a redirect.
The multiple steps involved in Retry are fragile (see #1709), and they are not authenticated. That is a source of some concern. As has been observed, an on-the-side attacker gets one shot at rewriting the connection ID as a result of this. We decided that this isn't a big deal, but the interaction between an attacker-chosen connection ID with packet number encryption is a minor concern. Maybe it's time to just button this all down fully.