quicwg / base-drafts

Internet-Drafts that make up the base QUIC specification
https://quicwg.org
1.63k stars 205 forks source link

Path Challenge Padding and Amplification Protection #4257

Closed nibanks closed 3 years ago

nibanks commented 4 years ago

I've implemented the new requirements to pad path challenges in MsQuic and I immediately startled hitting asserts in my code during automated testing because the server ended up trying to pad beyond the 3x amplification limit for the new path.

What am I supposed to do here? Pad only up to the 3x limit? Wait until I have received enough on the path to be able to send a fully padded packet? Don't pad, but then track some state to additionally do PMTUD sometime later, after I've validated the source address?

I might have missed it, but I think this scenario needs explicit text in the spec.

martinthomson commented 4 years ago

Can you walk through the scenario? The server observes a NAT rebinding or migration and decides that the new client address is not validated. It sends PATH_CHALLENGE and pads, but the packets it has received thus far are too small and therefore it immediately exceeds the 3x limit?

This seems like it might be worth fixing. Like you, I don't have an easy answer. If the client probes the path, there is no issue. However, a small non-probing packet can happen and we need an answer for the server in that case.

Curiously, you could send PATH_CHALLENGE in a small packet too, but not regard any response as path validation. The client should send PATH_RESPONSE padded and open up the server sending quota, allowing the server to properly probe.

nibanks commented 4 years ago

The simplest scenario is NAT rebinding, and the first packet received on the path is small. Per-spec, we're still supposed to send a path challenge, and per my understanding, we're always supposed to pad those.

One option, at least for this scenario would be to make an exception, with the assumption that rebinding shouldn't change the MTU. But I don't care for assumptions that could really bite you if you get it wrong. Additionally, it doesn't fix the bigger problem if it wasn't NAT rebinding.

ianswett commented 4 years ago

To repeat what @martinthomson has said before, every time we change something, we introduce some issue.

If a connection is immediately migrating prior to probing the new path with PATH_CHALLENGE, I believe it either means it's unintentional or the new path is the only one available. In either case, if the MTU is too small, you're sunk, so making an exception to padding for cases of immediate migration seems ok to me. The only cost is it may take a bit longer to realize the new path has a smaller MTU?

kazuho commented 4 years ago

I might be dumb, but do we require amplification limit to be applied to path challenges?

I am not sure if there's such a requirement. I do not think we have to have that.

Unlike a DDoS attack using Initial packets, an off-path attacker cannot mount a DDoS attack using spoofed packets to elicit path challenge; only an MOTS attacker can do that. I would also point out that because the server has per-connection state, it will rate-limit (or bail out from) the number of path challenges it sends.

tatsuhiro-t commented 4 years ago

If I understand the spec correctly, server enforces amplification protection before it receives first client Handshake packet. So the intended scenario is that server sends PATH_CHALLENGE in its 0.5 RTT flight. But do we need to send them? Because receiving client Handshake validates client address, server does not need to send PATH_CHALLENGE during handshake.

nibanks commented 4 years ago

I went back and looked at the specs, @kazuho you're right that we don't seems to require amplification protection in the spec explicitly for migration, but is that really correct? I figured it was an amplification attack vector just like the handshake (though a bit less impactful since you have to handshake first). Couldn't you connect to a server that has a larger initial window and then send a request for a large response, spoofing the target IP address? Rinse and repeat?

If we really don't need to have amplification protection for migration, then I'm fine to close this issue (and go remove some code).

martinthomson commented 4 years ago

Hmm, @kazuho is right, I can't find any restriction on use of paths after migration. Our limits for anti-amplification are specifically: handshake, closing, and stateless reset. No other mention exists.

That doesn't mean that Nick is wrong to have this code though. Without limits, an attacker could connect to a server then spoof a bunch of small packets to cause an amplification attack toward a chosen target.

We can turn that into a low-grade amplification attack if we allow probes to be exempt from that limit. It's low-grade then for several reasons. It is stateful at the server and the connection establishment is expensive. A single connection can be used to attack multiple targets, but a server won't send many packets to each. In my half-written migration code, this results in just three packets, which isn't that useful.

After digging in more here, I think that we can make two changes, one the direct response to this issue, but the first is in response to the unstated issue that we don't have anti-amplification protection for migration.

  1. Add the 3x rule for any unvalidated path.

  2. Allow path probes to ignore the 3x rule.

The alternative is to leave this alone and focus on clarification. We have a requirement that the server reduce its congestion window, so the concrete amount of amplification that can be gained without the 3x rule being in place is not that much relative to the cost of setting up the attack.

I'm ambivalent on this. I can see the appeal of the principled fix, but getting it right could be tricky.

ianswett commented 4 years ago

I think leaving this alone and focus on clarification SGTM.

Amplification limits are tricky, as evidenced by the repeated tweaks we've needed to enforce the 3x limit during the handshake, so adding one at this stage sounds risky.

Given there is some 'protection' by requiring the handshake to complete prior to migration, I think this protection is sufficient as long as one can't repeatedly abuse migration by immediately migrating over and over. I'm not sure if the existing text provides enough protection or not from that situation or not?

kazuho commented 4 years ago

I think that there is an existing MUST regarding this problem. It states that until a peer's address is deemed valid, an endpoint MUST limit the rate at which it sends data to this address. The endpoint MUST NOT send more than a minimum congestion window's worth of data per estimated round-trip time (kMinimumWindow) (section 9.3.1; transport draft).

So to answer to @nibanks, current limit is not INITCWND, but 2 MTU.

I think that this limit might be a bit on the loose side, but I wonder if there is a need to change.

PS. If we are to change the existing rule, I like what @martinthomson has written above.

martinthomson commented 4 years ago

Thanks for finding that kMinimumWindow thing @kazuho, that makes me much happier. Relative to the cost of mounting the attack, I don't expect an attacker to find that attractive.

nibanks commented 4 years ago

Ok, so we're saying that instead of the 3x limit, we're going to use 2 MTU per Initial RTT (path challenges aren't retransmitted on loss, right?)? I'm just trying to reason about how I should update my logic. It used to be:

void OnPathDataReceived(path, length) {
  if (!path->Valid) {
    path->allowance += 3 * length;
    TrySendPathChallenge(path)
  }
}

Now, it should be something like this?

void OnPathDataReceived(path, length) {
  if (!path->Valid) {
    if (!path->HasSentChallenge || path->TimeSinceLastChallengeReset() > INIT_RTT) {
      path->allowance += 2 * MTU;
      path->lastChallengeReset = now()
    }
    TrySendPathChallenge(path)
  }
}

And, just so my understanding is clear, if I receive a minimum size packet (10ish byte UDP payload?) from an unvalidated path on average 1/2 x INIT_RTT, then I will be amplifying that traffic by ~100x, right?

And a man-on-the-side attacker could just take every small packet destined to my server that it sees, and race a copy, with changed source address to some target address, and I'd start doing this amplification for all those connections' packets?

kazuho commented 4 years ago

And, just so my understanding is clear, if I receive a minimum size packet (10ish byte UDP payload?) from an unvalidated path on average 1/2 x INIT_RTT, then I will be amplifying that traffic by ~100x, right?

For one unvalidated path, an endpoint would send no more than 2 MTU worth of data per every Initial RTT (333ms). Therefore, while the amplification factor will be large, the amount of traffic being sent is going to be small.

For all the unvalidated paths from which an endpoint might receive data, the upper bound would be 2 MTU multiplied by the number of paths that an endpoint can validate concurrently. From what I understand, each path validation attempt is abandoned by a timeout, that is greater than 2 seconds (section 8.2.4).

nibanks commented 4 years ago

Ok. I did some excel math to get a more accurate number, and it seems at least for IPv4, the amplification ratio is closer to 25x. To make use of this though, as you indicated Kazuho, the attacker would be limited by the 2 MTU, per unvalidated path.

Assuming a server tracks only 4 paths max, per connection, and an attacker opens 1000 connections to the server and uses them as an amplification vector, the attacker can send packets at less than 10Mbps to get almost 250 Mbps send to the target. Do this with a couple different servers (which ever you find to track the most parallel paths) and I'd say you have a pretty good attack vehicle.

And as I mentioned above, a MoTS can copy/modify/race valid connection traffic to achieve similar results.

Obviously getting a connection to the handshake confirmed state before it can be used as an amplification vector is a hurdle, but I wouldn't say it a very large one. The attacker can open connection to many different servers to mask it's attack origin somewhat. IMO this still seem like a very valid threat we at the very least should mention in the spec, if not actually fix.

martinthomson commented 4 years ago

I think that I'd like to see the working for that, because I have a larger number, depending on how many probes are involved (at 3 probes per path, the number is 160-ish for a server that uses one byte connection IDs, or 42 for a more reasonable 8 byte CID), at last for those few paths that a server might choose to validate.

That is, if you take the handshake cost out, which is fair if you consider them amortized, but a server will only probe so many paths.

If you take the handshake into account and leave aside the possibility that you can amortize the handshake costs across attacks on different servers, then you get a lower value.

nibanks commented 4 years ago

My math:

Incoming Rate = (IPv4_Header_Size + UDP_Header_Size + 20) * 6 PKTS/SEC * PATHS_PER_CONN * CONNS Outgoing Rate = (IPv4_Header_Size + UDP_Header_Size + 1200) * 6 PKTS/SEC * PATHS_PER_CONN * CONNS

The amplification ratio is (IPv4_Header_Size + UDP_Header_Size + 1200) / (IPv4_Header_Size + UDP_Header_Size + 20), which is 1228 / 48 = 25.58333. For incoming size of 28 bytes (8 byte CID) the number comes down to 21.93.

martinthomson commented 4 years ago

OK, that helps. I didn't factor in packet headers in my calculations, but I also allow for multiple probes in response to a single non-probing packet.

What I'm concerned about then is considering the cost of the handshake. To complete the handshake, you need to send one 1200 byte packet and one packet of about 78 bytes. For an attack on a single host, this changes the dynamic considerably: you send 1220 + (98 + CID) + (41 + CID) to get 3*1220, assuming that you send three probes on new paths. That's lower than our 3x target even if the connection ID is zero length (2.66).

Even if you amortize that setup cost across multiple targets, the amplification factor isn't as bad as I first thought. If a server probes up to 10 paths (which seems high), then you get 3*1200 / ((1318 + CID) / n + 41 + CID) which for a CID of 8 bytes and 10 paths is 19.8. That's much worse, but not so bad as it is spread over multiple paths.

We might limit the number of concurrent path validations, but this is not really going to change anything if they are just strung out. Only the total amount matters if the attacker is even slightly patient.

I think that there might be some clarifications we can add:

  1. Point more clearly to the requirement to limit the sending rate on unvalidated paths after migration.

  2. Note that this rate limit applies to each path and that it might be necessary to limit the number of paths that are concurrently validated in this way, either per connection or across all connections (perhaps per destination if you are able).

  3. Note that multiple path validations in sequence can still be used to use a single connection to attack multiple targets and that it might be necessary to limit the number of total failed path validations.

On this last point, the total exposure is fixed in total to the number of probes in each failed validation attempt (e.g., 3) times the number of failed validations that the endpoint permits (e.g., 10). Using 3 and 10 gives the ~20x figure above, which is pretty bad, but not so far out of proportion that it couldn't be managed. Keep in mind that the attacker needs to do all the crypto of a handshake in order to access that, plus all the protocol logic and timers over two round trips. All that to get that 20x factor.

kazuho commented 4 years ago

Thank you for the calculations. Something around ~20x is what I have expected too.

I tend to agree that having some cautionary text is sufficient.

I'd also point out that we have a sentence that states: _In response to an apparent migration, endpoints MUST validate the previously active path using a PATHCHALLENGE frame. (section 9.3.3). Therefore, in addition to sending bytes to the victim, servers used for the reflection attack will be sending about 1/2 of that to the initiator of the attack. I'm not sure how critical that is to the attacker, though.

janaiyengar commented 4 years ago

This is a response to the the discussion in #4264 on the potential performance cost of applying this limit on quiescence.

This roundtrip isn't the only perf issue on NAT rebinding:

Despite the fact that we want to have continuity across NAT rebinding, we did not design for it to be seamless in terms of performance. I don't think we should start now.

Let's remember that NAT rebinding is the uncommon case, and typically occurs after quiescence of 30 or 60 seconds. A client app that wants to improve performance because this quiescence is a common pattern for it can always use one of many methods, including sending PINGs.

I'd strongly argue against making any recommendations. This is a question where I believe more experience is necessary to learn best practices.

martinthomson commented 4 years ago

I would also point out that we suggest that path validation can be performed when using a path that hasn't been used for some time. This isn't three datagrams, but it's now a fully padded datagram, which will allow for three packets. That's pretty good relative to a full handshake, if an implementation chooses to do that.

nibanks commented 4 years ago

I agree with the sentiment that NAT rebinding is an edge case and it likely to have all sorts of performance related issues, and we don't really need to try to optimize for them. I personally think mentioning the known problems an possible solutions would be nice though.

The only real problem I left is that is seems there is some confusion (as seen from my discussion with Kazuho on the PR) about amplification protection and its relation to port-only NAT rebinding. Is it an exception to the rule or not? I didn't see any specific text on this one way or the other in the latest PR last I look, and I think it might be worth while to have some.

MikeBishop commented 4 years ago

If the client IP address has changed, the server MUST adhere to the anti-amplification limit; see {{address-validation}}. Note that in the presence of NAT, this requirement might be insufficient to protect other hosts that share the NAT from amplification attack.

My reading of this text is that for a port-only change, you're not mandated to, but a cautious implementation could certainly choose to apply the limit to that case as well.

janaiyengar commented 4 years ago

One point that seems to have been lost is the addition of an error code -- see discussion here. I think adding an error code for NO_VIABLE_PATH is a good idea.

martinthomson commented 4 years ago

Let's do that as a separate PR, which I will write up now, so that I don't forget...

huitema commented 3 years ago

I am seeing interop failures. Picoquic avoids sending full size datagrams in response to short natted packets; it just send short challenges to both client addresses, and will perform PMTUD later. Some implementations see "unpadded challenges" as some kind of protocol violation and just ignore them, leading to connection failures.

This is a difficult situation, because implementations end up having to choose between sending long challenge packets in response to NAT and risking participating in reflection DDOS, or sending short challenge packets and risking connection failures.

I think this is a case in which the behavior of clients and servers vary. When clients send PATH CHALLENGE, they are either deliberately attempting to probe a new path, or maybe verifying a NAT response from the server. In both cases, padding is appropriate. For servers, padding is appropriate in response to a padded client message, but it is mostly inappropriate in response to a short natted packet.

larseggert commented 3 years ago

So at least my implementation requires padded PATH_CHALLENGES, because Section 8.2.1 says:

An endpoint MUST expand datagrams that contain a PATH_CHALLENGE frame to at least the smallest allowed maximum datagram size of 1200 bytes.

Is there an inconsistency in the spec here?

ianswett commented 3 years ago

So at least my implementation requires padded PATH_CHALLENGES, because Section 8.2.1 says:

An endpoint MUST expand datagrams that contain a PATH_CHALLENGE frame to at least the smallest allowed maximum datagram size of 1200 bytes.

Is there an inconsistency in the spec here?

The PR adds ", unless the anti-amplification limit for the path does not permit sending a datagram of this size."

MikeBishop commented 3 years ago

Given that you can't validate the "unless" criteria for the peer, it seems like the client isn't able to enforce this MUST.

huitema commented 3 years ago

What Mike says. "MUST ... unless" is not standard per RFC 2119 and 8174, and that leaves implementers with fuzzy guidance, leading to interop failures. The proper text would be something like:

Client endpoints MUST expand datagrams that contain a PATH_CHALLENGE frame
to at least the smallest allowed maximum datagram size of 1200 bytes. Server endpoints
SHOULD do the same, unless the anti-amplification limit for the path does not permit
sending a datagram of this size.
DavidSchinazi commented 3 years ago

I agree with @huitema here, client MUST + server SHOULD sounds like the easiest way to resolve this issue.

kazuho commented 3 years ago

Given that you can't validate the "unless" criteria for the peer, it seems like the client isn't able to enforce this MUST.

I'm not sure if I follow. IIUC, we do have MUSTs that are not enforceable by the peer. To give an example, a server cannot close a connection when it receives a half-sized datagram containing an Initial packet. Is the argument that we have to change the existing MUST pad rule to SHOULD?

IMO "MUST unless" is better than "SHOULD unless", because it is clear that the sender is required to behave as specified whenever the exception is not being met. SHOULD sounds like there are other cases when a sender can send a smaller sized datagrams.

janaiyengar commented 3 years ago

FTR, this is now resolved, see discussion: https://github.com/quicwg/base-drafts/pull/4264/files#r538962489. The editorial fix is in this commit: https://github.com/quicwg/base-drafts/pull/4264/commits/8324d829dbf4add57887e10ce7cdb829b6afde3f

martinthomson commented 3 years ago

Reopening as requested by chairs.

LPardue commented 3 years ago

Closing this now that the IESG review of draft 33 has concluded.