quicwg / base-drafts

Internet-Drafts that make up the base QUIC specification
https://quicwg.org
1.63k stars 204 forks source link

Disallow reuse of stateless reset tokens #2785

Closed DavidSchinazi closed 4 years ago

DavidSchinazi commented 5 years ago

As part of the discussions on #2645 / #2769, we've found that reusing stateless reset tokens (SRTs) can allow attackers that delay packets to close connections. Even though the text in #2733 does a good job explaining why you shouldn't reuse SRTs, I don't see any reason why one would want to reuse SRTs across connection IDs in the same connection. Therefore I would prefer we ban reusing them, and make it a PROTOCOL_VIOLATION to do so. Apologies for not raising this on #2732, I had not thought about the problem at that time.

kazuho commented 5 years ago

Apologies for not raising this on #2732, I had not thought about the problem at that time.

I do not think you missed anything. Instead, you spotted the issue at the exact moment; it's my understanding that #2769 is introducing this issue. There is no problem in the current design, because receiving a packet with an outdated CID is not expected to cause a stateless reset.

Regarding why you'd want to reuse a SRT, the simple reason is because you do not need to create different SRTs for every Connection ID.

A QUIC stack that generates a CID by a construction like AES_ECB(cid_key, conn_id || path_index)1 can use HMAC(srt_key, conn_id) as a construction for generating SRT.

For QUIC stacks that aren't interested in retiring CIDs (which seems to be the majority for me), this construction makes sense. I think my preference would be to retain the design; it's just about adding a warning when you want to retire a CID.

1: path_index is a sequence number of path for each connection

DavidSchinazi commented 5 years ago

I don't think #2769 introduces this issue. It does make it worse but the issue was already present. It's reasonable for a server implementation to drop a CID from their state when the client retires it.

There are two options to guarantee safety:

  1. make sure each CID has a distinct SRT
  2. make sure once a CID is sent in NEW_CONNECTION_ID it always routes back to this connection

I think (1) is simpler and giving implementations two options here not only makes the spec more complicated, but can also lead to accidental vulnerabilities.

I think it would be safer for the implementations you mention to use HMAC(srt_key, conn_id || path_index) to generate their SRT. That also makes it simpler to generate the SRT from the cleartext CID.

In general, it's been this spec philosophy to add constraints that can be verified by the peer whenever possible. Forcing distinct CIDs ensures verifiable security whereas requiring that CIDs get routed appropriately for the duration of the connection is not enforceable and therefore easier to get wrong.

MikeBishop commented 5 years ago

Essentially, the current text says that an issuer can't actually forget a retired CID until all CIDs with the same SRT have been retired. At that point and not before, you can forget all of them at once. #2769 doesn't change this at all -- doesn't even really make it worse.

That PR provides a mechanism for asking a client to retire a certain set of CIDs; if the set only partially overlaps with the set of CIDs covered by an SRT, that mechanism is not terribly useful because you can't actually forget all the CIDs you asked the peer to retire. It could be very useful to ask the peer to retire the last CID in one of these groups because you've got 200 of them you'd really like to clear out.

But that doesn't change the core fact: An issuer can't actually forget a retired CID until all CIDs with the same SRT have been retired. If each SRT maps uniquely to a CID, this is a simple requirement to satisfy. If you choose to do something else, that's your problem.

kazuho commented 5 years ago

I think what @MikeBishop says is correct.

OTOH, I now think that forbidding reuse of SRT might help the receiver of the stateless resets.

Without the prohibition, when retiring a CID, an endpoint needs to consult other SRTs that have been issued for the same connection to see if it can unregister the SRT corresponding to the CID being retired, or use a ref-counted hashmap for maintaining the mapping from SRTs to connections.

We might argue that this requirement is easy to miss, and hard to test.

Having the prohibition removes this requirement and therefore can be considered as a simplification.

Contrary to that, we might argue that allowing the reuse of SRT has marginal benefit. It is trivial to construct different SRT for each CID, as pointed out by @DavidSchinazi.

martinthomson commented 5 years ago

@MikeBishop's analysis is entirely correct.

I don't think that we should put in rules to prevent self-harm through idiocy. And this is definitely a case of that. An endpoint that forgets a CID when the associated SRT is active only hurts themselves.

The text is sufficient as it stands. A prohibition won't prevent implementations from doing dangerous things. As it stands, you need to take extraordinary steps to put yourself in position to mess this up. The construction @kazuho describes is a totally sensible one, but it is very hard to forget connection IDs with that scheme. More relevant to this, the SRT construction process described in the spec won't have this problem.

What exactly do you think this prohibition will achieve?

kazuho commented 5 years ago

What exactly do you think this prohibition will achieve?

If there is a prohibition, an endpoint can remove the SRT of the CID being retired, without consulting the SRTs of other CIDs belonging to the same connection.

While incorrect in terms of current spec, such design works flawlessly with endpoints that issue different SRT for each CID. It seems that most implementations would issue different SRT for each CID. Then, there's chance that we might see stacks implementing this incorrect but simpler approach.

Having the prohibition eliminates the chance of seeing these "incorrectness."

marten-seemann commented 5 years ago

I agree that we should remove the text that allows the reuse of SRTs from the spec (In fact, last night I was planning to open exactly this issue, and woke up this morning to see that @DavidSchinazi already did so while I was asleep).

I don’t see any reason why you’d want to reuse a SRT in the first place - computing one HMAC shouldn’t be more expensive than applying the packe protection to send a single packet. With regards to how you store connection IDS, this design is only simpler if you do the wrong thing and don‘t keep track of expired connection IDs that still have active SRTs. Doing it the right way will most likely result in a more complex design than associating each CID with its own SRT.

In short: it’s easy to mess up, and there’s little benefit, so let’s remove it.

mikkelfj commented 5 years ago

I'm not totally against a unique SRT per CID, but I am concerned about the static memory use of otherwise passive connections.

If a CID and SRT can be 255 bytes each (depending on where discussions go) that is half a kilobyte per CID, and there might be 8 of those, possibly more. That is 4K per connection even if you retire all active buffers because there isn't currently any traffic.

DavidSchinazi commented 5 years ago

@mikkelfj SRTs are 16 bytes long

mikkelfj commented 5 years ago

@DavidSchinazi I'm not sure about that - isn't there text about hiding SRT among ordinary traffic. If so it cannot be shorter than the CID + overhead, well the packet that is. But I'm probably mixing up packet size with storage requirements.

ianswett commented 5 years ago

The SRT is only 16 bytes, but the packet it's sent in is supposed to look like a typical QUIC short header packet, and therefore needs to be of a minimum size.

MikeBishop commented 5 years ago

So essentially, we're saying that we maybe didn't really have consensus on the answer to #2732 being "yes." But since there was a consensus call that said we did, now the question is whether we have consensus to reverse that decision. Let me suggest that be taken to the list.

larseggert commented 5 years ago

Discussed in Montreal, next step is for @martinthomson to do a PR and follow normal process

martinthomson commented 5 years ago

Conclusion: prohibit the use of duplicate tokens. Allow, but not require the use of PROTOCOL_VIOLATION if a duplicate is detected.

martinthomson commented 4 years ago

2968 covered this.