quicwg / base-drafts

Internet-Drafts that make up the base QUIC specification
https://quicwg.org
1.63k stars 205 forks source link

Looping with multiple Retry packets #1451

Closed martinthomson closed 6 years ago

martinthomson commented 6 years ago

Allowing multiple Retry packets creates a potential for regression of the address validation tokens.

Say that a client retransmits its first Initial packet without a token. The server responds to both with the same token. The second of these packets is delayed.

The client receives the first, sends another Initial packet and receives a second token in response.

Then the Retry that the server sent in response to the retransmission of the first Initial is received. The client switches to that token as though it were new, but it's back to the first token.

If the server relies on multiple Retry packets and the progressive validation of the address using those tokens, then this will revert any progress that was made. Because Retry can't be sent indefinitely (it has this arbitrary limit of 3 changes), this might cause that connection to fail.

mikkelfj commented 6 years ago

A client should know the initial CID used whether it is retry or not. If an old retry lands, its destinition CID would not match the ongoing session.

Client could choose to opt for the originally based on an informed decision - which could be if a handshake, rather than a retry is received. But otherwise I don't see how the client can get confused (except if extremely unfortunate to get a random collision).

If the client does multiple retry sequences using a received CID, then there is a potential conflict, but I don't think multiple retries is a good idea, nor necessary.

nibanks commented 6 years ago

The number one scenario I personally want to make sure we support (and requires multiple retry support possibly) is an independent DDoS protection device/component sitting in front of a QUIC server. One of the main reasons for the redesign of the Retry mechanics in the new Stream 0 design is this. When this device is in path, it is possible that it sends a Retry for the first client Initial packet, and then lets the second Initial packet through. The QUIC server receives the second client Initial, it may (for whatever reason) decide it wants to do a Retry. The client needs to be able to handle this scenario and respond to the second Retry packet with a third client Initial packet.

Now obviously, it would be best if the QUIC server doesn't need to send the second Retry packet but it might not (probably won't) know about the DDoS protection device sitting in front of it. If it does know about it, it could just assume all client Initial packets are validated, but I don't want to depend on this, as it's dependent on a particular deployment and infrastructure.

martinthomson commented 6 years ago

@mikkelfj, yes, a connection ID might allow this to be detected. However, add one iteration and the problem remains.

@nibanks, you seem to have described a scenario where this gets even worse. The retransmission has effectively forked the connection. One of those forks is going to die, but you can't be certain which one. We're reduced to relying on chance for having the connection complete.

mikkelfj commented 6 years ago

@martinthomson yes - what if the source ID must be different from the previous source ID when responding to a retry? The initial SCID could be zero-length, but first, second etc. attempts cannot. The client can later revert to 0-length SCID.

This could mess with P2P server load balancers but then SCID would not be zero-length anyway.

ianswett commented 6 years ago

@nibanks Your architectural description makes sense, but it doesn't seem that difficult to tell a server not to send RETRY if it's behind such a device. Not only should it be possible, I think it's necessary for ideal performance?

Requiring a different source ID seems to fix the immediate issue here, but we'd also have to limit the number of RETRYs to 2, which may be sensible anyway?

jmtilli commented 6 years ago

I read the QUIC specifications, aiming to add QUIC support to a layer 2 TCP SYN proxy I have implemented (https://github.com/Aalto5G/nmsynproxy). I realized it may very well require supporting multiple RETRY packets (I'm not 100% certain about this yet), which the latest QUIC Internet Draft does not permit.

I'm not at all convinced by the suggestion of @ianswett that servers could be told to not send RETRY if behind a DoS protection device. If you want to deploy such a DoS protection device to a network having hundreds or thousands of server machines, surely you cannot turn the switch on all servers that RETRY is not required, exactly at the same second you deploy the DoS protection device! Worse, some of the servers may not be in your control (think about Amazon AWS), and some QUIC implementations may lack a switch to turn off sending of the RETRY.

Furthermore, limiting the number of retries to 2 might work badly if such DoS protection devices are nested. It may be the case that there will be a RETRY from first DoS protection device, from second DoS protection device, ..., from N:th DoS protecton device and then eventually from the server. Performance is going to suck, but such a deployment should in my opinion be supported by QUIC. Such a deployment works perfectly well with TCP (you can nest TCP SYN proxies), and I don't believe QUIC should be a step backwards from TCP.

I hope QUIC can support a DoS protection middlebox, like TCP already does. I also hope such DoS protection middleboxes could be nested nearly infinitely (ok, TTL issues might affect you once you have dozens of such middleboxes, if operating on layer 3, but there are solutions such as layer 2 middleboxes that don't decrement TTL).

Remember that firewalls are a very common form of middlebox. Today, a firewall can implement TCP SYN proxy, meaning it doesn't have to keep state for connections from unvalidated addresses. With QUIC, the firewall could use normal UDP support without special support for QUIC, but then a UDP flood will fill the state tables of the firewall (and only the endpoint server's state tables will be protected from exhaustion by the RETRY mechanism). The only possibility to save from memory exhaustion is for the firewall to send a RETRY packet with cryptographically validated token the client should repeat, and then after validating the client's address, add the connection to the state table. Many enterprises have nested firewalls. Some of the nested firewalls may be in control of different entities (operator's firewall, common company firewall, some department's firewall).

I'm not going to implement a full firewall, but I'm definitely looking forward to be able to add QUIC support to my TCP SYN proxy that lacks the rulesets and other features of a firewall, but has fully functional SYN proxy features. I'm also looking forward to be able to nest the QUIC version of the SYN proxy semi-infinitely.

Now, the following kind of architecture might (or might not) work, I have to investigate it further:

[client] -> ClientHello -> [middlebox]                   [server]
[client] <- RETRY       <- [middlebox]                   [server]
[client] -> ClientHello -> [middlebox]                   [server]
[client]                   [middlebox] -> ClientHello -> [server]
[client]                   [middlebox] <- RETRY       <- [server]
[client]                   [middlebox] -> ClientHello -> [server]
...

This architecture would work without having to fully propagate the RETRY back to the client. I'm not certain if some of the cryptographic mechanisms would make this architecture invalid.

marten-seemann commented 6 years ago

In an end-to-end protocol, we generally don't want "helpful" middleboxes to interfere with handshakes by performing a Retry (and thereby introducing latency). Obviously, this statement depends on the definition of "middlebox", which I here use as any node on the path that neither belongs to the server's nor the client's architecture. We should design QUIC's retry mechanism such that a retry can only be performed by devices that are under the control of the server operator.

Limiting the number of retries on the client side is an insufficient solution here, since under normal conditions, the server won't perform a retry, so the client would accept the any performed by the middlebox, and only fail in the rare case when both the middlebox and the server perform a retry.

I agree with @ianswett here, that any architecture that performs multiple retries for a single connection is so suboptimal for performance that you probably wouldn't want to deploy it, and I think QUIC should only support it if we can at the same time prevent retries by "helpful" middleboxes. In fact, if we only allow a single retry, the solution is straightforward: a server (or whatever device is responsible for the DoS projection) would simply reject all Initial packets that contain a token that it didn't issue itself.

nibanks commented 6 years ago

I agree that we don't want to introduce unnecessary latency but I disagree with your suggestion to only allow the end server to perform the Retry for the reasons I previously stated, and that @jmtilli built upon. I don't think we should necessarily allow an infinite number of Retry packets, but more than 2 does seem necessary to me.

ianswett commented 6 years ago

If we're going to allow more than 2, I'm not at all clear on how we're going to guarantee forward progress.

I'll point out that if a server is under DoS and thinks a QUIC connection is a potential attack, it can close the connection and force the client to fall back to TCP. This isn't optimal, but it avoids a lot of edge cases involving multiple retries and rollout issues where the server expects to be behind a DoS prevention device, but isn't temporarily.

I'll remind everyone that QUIC's requirement of a full-sized INITIAL should make it much less prone to issues than TCP syn attacks, so one layer of defense really should be enough.

MikeBishop commented 6 years ago

Would it make sense to carry a sequence number on the token? The server increments every time it sends a Retry; the client ignores Retries which don't exactly increment the sequence number of its most recent Initial. If the Initial gets duplicated / retransmitted, the client will latch onto whichever Retry it receives first and ignore the subsequent one; if the Retry gets duplicated/delayed, the client will ignore it.

nibanks commented 6 years ago

@MikeBishop for that to work, the QUIC server would need to read the Retry packet it gets from the client (that it sent in response to the DoS protection device's Retry) and then increment from there. But what happen if there was spurious retransmission? For instance, the DoS protection device ends up sending two Retry packets, and both are eventually responded to by the client and both responses end up getting to the QUIC server. The server would send seq_num=2 for its first Retry, but the client would ignore it.

nibanks commented 6 years ago

Maybe, instead of the server incrementing the sequence counter, the client does and the server just echoes it back?

nibanks commented 6 years ago

I'm afraid this could get easily ossified though, and I really don't want to encrypt yet another thing to prevent ossification.

MikeBishop commented 6 years ago

@nibanks, if the DoS device sends two Retry packets, the client would ignore the second:

Ossification is definitely a concern; someone could come to expect that this field is always zero on the client's first packet. Encryption would be preferable, but the point of this is to avoid doing decryption work on the server until it's reasonably confident that the client is genuine, so that's problematic.

However, GREASE is an option. Nothing requires this to start at 0/1 -- client could randomize the value on the first Initial, simply checking that the server incremented. If we permit wrapping when the server increments, all initial values are acceptable.

martinthomson commented 6 years ago

An alternative is to include the packet number from the Initial in the Retry packet. That makes Retry very-much more version-specific than it was, but it would allow the client to learn which Initial triggered the Retry, so that it can avoid going backwards.

kazuho commented 6 years ago

@marten-seemann

In an end-to-end protocol, we generally don't want "helpful" middleboxes to interfere with handshakes by performing a Retry (and thereby introducing latency). Obviously, this statement depends on the definition of "middlebox", which I here use as any node on the path that neither belongs to the server's nor the client's architecture. We should design QUIC's retry mechanism such that a retry can only be performed by devices that are under the control of the server operator.

+1

I agree to what Marten says.

Allowing middleboxes that send retries without sharing secrets with the servers has two issues:

Considering that, I'd prefer having a retry mechanism that enforces sharing a secret between the middlebox (that sends a retry) and the server. Which in turns would mean that multiple retries are unnecessary.

mikkelfj commented 6 years ago

An alternative is to include the packet number from the Initial in the Retry packet. That makes Retry very-much more version-specific than it was, but it would allow the client to learn which Initial triggered the Retry, so that it can avoid going backwards.

Isn't this always 0? Following retry, a new connection is made, so PN starts over, and we did drop that the random offset at start I suppose. Of course the encrypted PN would vary but not in any secret way, and the initial CID would vary too.

MikeBishop commented 6 years ago

Discussed on the editors' call, and then a follow-up conversation with @janaiyengar and a whiteboard. When there are multiple retry sources on the path, if the server's CID does not change between Retries, you are highly likely but not guaranteed to make forward progress and eventually get a response from the server. The client gets confused about two things in this scenario:

If the server does change CIDs on a Retry, however, you ratchet forward at each layer and ignore any extra Retry packets from the previous layer, which fixes both problems. The proposal on the call was to say that not only may each Retry change the CID, each Retry MUST change the CID unless the sender knows for sure that it's the last layer.

ekr commented 6 years ago

Let me see if I understand correctly. Are you saying it looks like this?

Initial [SCID=X, DCID=Y] ->
<- Retry[SCID=A, DCID=X]
Initial [SCID=X, DCID=A] ->
<- Retry [SCID=B, DCID=X]
Initial [SCID=X, DCID=B] ->

Is that what you are proposing?

If so, it has a problem: If the server sends a Retry with a low entropy SCID (i.e., A is short) then the attacker can send his own Retry with a random SCID (because he can guess the DCID in the second Initial) with the result that the connection likely fails (e.g., if the SCID is != 8 bytes). There are ways around this, but it will require other changes.

martinthomson commented 6 years ago

I pointed this out, but it was lost on transcription. We need to also require that the SCID in a Retry is >N octets (the same restrictions as on the client-generated DCID would work, but it's probably OK to use the 4 octet minimum).

ekr commented 6 years ago

Actually, I think it would be better to abandon the use of the SCID as a nonce here.

instead, I propose we say that Retry has to contain the last 8 octets of the Initial packet it's responding to. That's high entropy by definition (it's a hash of CH), and then we won't need any other rule

[OK, not a hash, actually the auth tag...]

martinthomson commented 6 years ago

That's fine with me. It means that senders of Initial packets need to remember those octets, but it simplifies things (to the point that we don't need a length field even).

mikkelfj commented 6 years ago

The sender might not have access to the encrypted packet if delegates that part.

martinthomson commented 6 years ago

Yes, but don't do that then.

ekr commented 6 years ago

Or delegate the processing of the Retry.

mikkelfj commented 6 years ago

I appreciate the simplicity, but consider an IoT device with crypto hw radio - or a transputer with many cores and few hw crypto units.

ekr commented 6 years ago

Said device can stuff the last bytes somewhere and pass them to the CPU.

martinthomson commented 6 years ago

Worst case, you can ask for the packet to be delivered back to yourself so that you can copy the octets and send it out again.

mikkelfj commented 6 years ago

Worst case, you can ask for the packet to be delivered back to yourself so that you can copy the octets and send it out again.

This can profoundly change the architecture of a QUIC deployment. It is far from easy to synchronize and send back data in a distributed pipeline. For the same reason I'm not too keen on PNE in its current form, but as long as the encrypted PNE has no semantic meaning, it is less of an issue as it can be delegated.

ekr commented 6 years ago

@mikkelfj: I'm not finding this objection particularly compelling, but feel free to propose an alternate design.

mikkelfj commented 6 years ago

One could include a retry serial number (or retry TTL) in the long header. That TTL could also be the start packet number (sort of what @martinthomson suggested earlier but in the explicit form: a client initial packet starts with packet number zero. Any retry MUST increase the packer number by one and start a new initial packet with the peer provided destination CID. A retry must reflect the source packet number. The client can choose how many retries it is willing to pursue, but it MUST not respond to a retry of a serial numbermore than once or to a number older than the latest initial request transmitted.

There is still a chance of receiving a valid handshake on an older serial number racing a retry packet. A client may choose to accept that handshake and discard all other connenction attempts on the same line of serial numbers. This to protect against man on side attacks.

There is a chance that a client receives multiple retries on the same serial number either through network packet duplication, attacks, or through competing infrastructure. In that case only one retry must be responded to - but this is already covered in the above.

The serial number alone is insufficient to distinguish between different connections. The client must therefore issue a SCID that it can associate with each retry or handshake response, but it can be the same, different, or zero length, depending on the clients configuration. (I think this would work, but need more thought).

ekr commented 6 years ago

I don't see how this protects against an off-path attacker injecting a fake Retry with serial number = 1. In order to have that, you need entropy from the client.

mikkelfj commented 6 years ago

Yes that is correct. I suppose that problem also exists today with a single retry?

ekr commented 6 years ago

No, it doesn't, because the server has to echo the client's original DCID, and that is high entropy.

mikkelfj commented 6 years ago

OK two things then: the original case: if the client has random initial DCID and a SCID the retry must respond with a new CID used for server routing, and the SCID to route to the client (if any), AND and additional copy of the origin random DCID for entropy. That is 3 three CIDS in a retry repsonse if that is to work.

If the above works, it also works with retries with serial numbers, but it does require 3 CIDS in a retry packet.

EDIT: no sorry: the second initial cannot carry random DCID.

ekr commented 6 years ago

The server is stateless and so cannot send the original DCID in the second retry.

mikkelfj commented 6 years ago

Yes I realized. It is necessary to have an extra nonce field in the intial packet, at least on secondary attempts, unless you derive from the encrypted packet. I see the complexity of adding an extra field here, and also the elegance of using the (non-secret) encrypted packet sample as nonce instead. I still prefer an extra nonce field to allow encryption to be pipelined, but I see your point.

kazuho commented 6 years ago

instead, I propose we say that Retry has to contain the last 8 octets of the Initial packet it's responding to. That's high entropy by definition (it's a hash of CH), and then we won't need any other rule

[OK, not a hash, actually the auth tag...]

@ekr Instead, can we require Retry packets to always contain SCID longer than eight octets? I'd assume that the server can switch to the final SCID when it sends an Initial packet.

I think that approach might be simpler than requiring the client to remember the last 8 octets of the Initial packet that it has sent.

ekr commented 6 years ago

@kazuho: I think that might work. yes, but I'd need to noodle on it a bit.

mikkelfj commented 6 years ago

Another observation: When the initial random DCID is used for routing, it can be used to attack a specific subset of infrastructure. If the initial DCID is zero length and a separate nonce is provided then the peers LB is responsible for distribution of traffic and the nonce becomes the same in both cases. Additionally, on secondary attempts the LB can verify the DCID now present from the retry packet.

That nonce could be part of the early SCID as @kazuho suggests, but that is a matter of framing.

ianswett commented 6 years ago

@kazuho why longer than 8 octets and not >=8 octets?

I would like the simple RETRY case to work as well, which is the server doesn't want to change the CID at all, and it is the terminal node, so I don't want to require senders of RETRY to be required to change CID.

Related question: What CID is the client deriving the key for INITIAL packets from? I was thinking all INITIAL packets used the same key, but if the terminal server may not have seen the client's first DCID, that doesn't work unless the original DCID is put somewhere the final server can read in the token.

nibanks commented 6 years ago

I agree with @mikkelfj here that relying on the encrypted packet makes a hw offload solution much more difficult, and I would prefer not to do that. The Initial packet is the one packet that we generally have extra room, so I don't think we should be afraid to add an additional nonce.

MikeBishop commented 6 years ago

I'm fine with either an explicit nonce or continuing to use ODCID as the PR currently does and including a minimum SCID length along with the "MUST change unless" text. The Initial packet does have extra room, in theory, but coalesced packets means we can use that extra room for 0-RTT data in the common case, so I'd rather not burn too many bytes that are only needed in this exceptional path. That makes an explicit nonce less preferred in my view.

I'm sympathetic to the offload issue -- this is worse than PNE because it doesn't just take the encrypted bytes as input, it requires remembering them.

kazuho commented 6 years ago

@ianswett:

@kazuho why longer than 8 octets and not >=8 octets?

I would like the simple RETRY case to work as well, which is the server doesn't want to change the CID at all, and it is the terminal node, so I don't want to require senders of RETRY to be required to change CID.

I agree. My intent was to say >= 8. Sorry for the confusion.

Related question: What CID is the client deriving the key for INITIAL packets from? I was thinking all INITIAL packets used the same key, but if the terminal server may not have seen the client's first DCID, that doesn't work unless the original DCID is put somewhere the final server can read in the token.

I agree with your observation, and think it would make sense to state that the DCID field of the Initial packet that carries the first Client Hello (i.e. the one that also gets padded up to 1280 octets) will be used as the key material.

EDIT. retracting my comment in the 2nd paragraph.

kazuho commented 6 years ago

@ianswett:

Related question: What CID is the client deriving the key for INITIAL packets from? I was thinking all INITIAL packets used the same key, but if the terminal server may not have seen the client's first DCID, that doesn't work unless the original DCID is put somewhere the final server can read in the token.

This is an interesting question.

If we are to permit running DOS mitigation devices that do not share some secrets with the servers, we would be required to state that the Initial keys would match to the DCID field of the Initial packet that carries the first ClientHello.

However, I am not sure if that is the right path forward, because defining such behavior would mean that somebody else can setup a middlebox that sets the Initial key to the same value for all connections that it sees (e.g., by sending a Retry with SCID field of 0000000000000000 for all connections), effectively nullifying the merit of having obfuscation.

Admittedly, this kind of attack can only be applied to server deployments that does not validate the token field generated by an on-path device (e.g., by sharing the secret between the DoS mitigation device and the server).

But my question is: do we want to even allow such server deployments to be set up?

nibanks commented 6 years ago

It is our (Microsoft's) goal to use Retry for DDoS mitigation and possibly load balancing. In both cases the middle box operating on behalf of the QUIC server would likely have little or no shared state with the QUIC server. This is a very high priority for our Azure scenarios. In order to support these scenarios, we would need the Retry packet to not rely on some shared secret.

kazuho commented 6 years ago

It is our (Microsoft's) goal to use Retry for DDoS mitigation and possibly load balancing.

That's understandable. I'd anticipate that others will do the same thing.

In both cases the middle box operating on behalf of the QUIC server would likely have little or no shared state with the QUIC server.

May I ask why you think that it is not a good idea to require every one of us to share some small amount of secret (e.g. encryption key) between the middleboxes and the servers than we would be running?

As stated in my previous comment, not sharing opens an attack vector. There could be other attack vectors since lack of shared state allows anyone to run the middlebox. Considering that, I think it makes sense to require sharing state, under the assumption that doing so is not hard.

mikkelfj commented 6 years ago

I believe the initial packet should always contain 1280 bytes, also after RETRY, otherwise an attacker might learn how to construct retry SCID's and use a second flight INITIAL as a DDoS vector.

The initial key should also be derived based on the NONCE in the current INITIAL packet. So, as I suggested before, it makes sense to carry the NONCE separately and make the initial DCID empty - it cannot be used for safe routing anyway and it simplifies routing logic because LB's don't have to know if the INITIAL DCID is random or not, they just have to take a decision based on emptiness.

The source SCID cannot be used as that NONCE without adding complexity to the clients LB infrastructure in a P2P server configuration. So I believe the NONCE should be a separate field. And it will not waste any space if the premise holds that all initial packets must be 1280 to prevent attacks.

mikkelfj commented 6 years ago

With the iteration specific NONCE reflected in RETRY, a client can easily drop secondary retry responses to older iterations of retries. But the NONCE does not have be reflected because the response would use a key derived from that NONCE thus failing verification if it doesn't match.

Using packet numbers to count iterations would simplify operations, but are not strictly necessary. In this form the INITIAL packet must have packet number that identifies the retry attempt number, starting at 0 and the retry response must reflect that packet number.

Late handshake responses to older iterations may be preferable to new retry responses due to man-on-side races. This can be detected via trial decryption of older NONCE derived keys, but it is simpler if the first server handshake packet number reflects the clients INITIAL packet number.

A NONCE could be generated randomly, or it could be using counter mode encryption such that the INITIAL packet number is an IV encrypting a base NONCE. Using counter mode avoids having to store several NONCE's in case an old handshake is received.

mikkelfj commented 6 years ago

May I ask why you think that it is not a good idea to require every one of us to share some small amount of secret (e.g. encryption key) between the middleboxes and the servers than we would be running?

If you are in a cloud hosted setup (like Azure or Digital Ocean) the servers are operated by customers while load balancers and DDoS mitigation devices might be operated by the cloud provider. Requiring shared secrets here is messy. Sometimes this is done with TLS termination today, but it is still messy - some moving towards auto-configuration via LetsEncrypt - which of course would be an option for QUIC as well - but there are many moving parts making automation difficult.