Add packet transform that allows adding padding

tfpauly commented 7 months ago

In addition to scramble, we should define a "scramble-padded" variant. This would be the same crypto as scramble but could either:

Always add a one byte at the end that describes the length of padded bytes at the end
Optionally add one byte at the end that describes padded length, which must be preceded by at least N (4?) bytes of zero

bemasc commented 6 months ago

After thinking about this for a bit, I'm not convinced that this is a good idea.

Let's consider a few possible passive attackers:

Double-sided attacks on Connection ID mapping. The attacker sees all traffic into and out of the proxy, and is attempting to identify which Virtual CIDs correspond to which real CIDs. 1a. Attacker must present a small number of packets that provide absolute proof of a given VCID-CID pair. 1b. Attacker can collect and analyze an unbounded amount of traffic to generate strong evidence/suspicion of a mapping.
Single-sided attacks on content identity. The attacker is attempting to identify packet transmission patterns indicative of using a particular service or downloading particular content. 2a. Attacker on the client-proxy link. 2b. Attacker on the proxy-target link.

The "scramble" transform is design to defeat threat model 1a. We don't need padding in this threat model.

The "scramble-padding" transform would not be sufficient to address threat 1b. It is still a "packet for packet" proxy, so the attacker can use the timing of individual packets and the large-scale patterns in traffic flows to identify CID mappings. Note that per-packet padding is much less effective here than in DNS (RFC 8467), because QUIC packets form long-term flows that are visible on both sides from their fixed CIDs. Padding in this way would also be less efficient than padding with ordinary "connect-udp", for two reasons:

"connect-udp" does not reveal the distinct flows on the client-proxy link.
"connect-udp" permits packet aggregation and splitting, obscuring the precise number of packets sent and allowing padding to be amortized across a flight of packets.

Given that QUIC Forwarding Mode is motivated (at least in part) by a desire to reduce the space overhead of "connect-udp", it seems paradoxical to define a mode that would require more overhead to reach the desired privacy level.

A "scramble-padding" mode could be targeted at threat 2a. However, this mode competes with the alternative of applying end-to-end padding on the client-target connection. In that comparison, "scramble-padding" has several deficiencies:

It does not defend against attacker 2b. More generally, it is dependent on use of the proxy.
The proxy does not know anything about the content, such as its privacy relevance or its delay sensitivity, so it cannot select a sensible padding policy. The client would need a new protocol for communicating this kind of policy information to the proxy.
As above, it is less efficient than end-to-end padding due to the inability to coalesce packets.

I do think these threat models are worth addressing, but I suspect that the right solutions do not involve QUIC Forwarding Mode. Threat 1b will require a deeper design that hides short-term correlation and also breaks long-term flow linkability. Threat 2a/2b is best addressed by improving support for end-to-end padding in QUIC.

DavidSchinazi commented 6 months ago

I don't think "just use regular CONNECT-UDP / tunneled mode" is a great answer here: when we have multiple nested CONNECT-UDP tunnels, that padding would be on the client-proxyA , client-proxyB, client-proxyC, etc connections, as opposed to client-proxyA, proxyA-proxyB, proxyB-proxyC. That means that you'd increase padding the closer you get to the client. A forwarding mode transform with padding would allow hop-by-hop padding, and I think that has value.

bemasc commented 6 months ago

No, forwarding mode transforms are client to proxy, and proxy A is not the client to proxy B -- it's just a transport proxy. Proxy A doesn't even know that Proxy B is a proxy.

I don't think hop-by-hop padding is desirable, because it conflicts with the privacy goal of limiting the damage that each proxy can do. It seems more logical to pad once, at the innermost possible layer.

DavidSchinazi commented 6 months ago

If the goal is to decorrelate traffic on both sides of a proxy, independent hop-by-hop padding will be much more effective than only innermost padding

bemasc commented 6 months ago

It sounds like you're describing threat model 1b (defined above). That attacker is essentially too powerful for a padding packet transform to do any good. We will need a different approach.

DavidSchinazi commented 6 months ago

The dichotomy above between 1a and 1b lacks nuance. There is value in reducing the amount of signal that an attacker has. The attacker I have in mind can see traffic on both sides of the proxy but only has the ability to save limited amounts of data. When the cost of building a solution that supports padding is low, I'd rather build a solution without necessarily using it, rather than needing it and not having it. This is an instance where "we'll build it when we need it" means that receivers won't support receiving the padding.

bemasc commented 6 months ago

An attacker just has to store the sequence (not even timestamps!) of CIDs and VCIDs crossing the proxy. With any reasonable compression scheme this will be less than 4 bytes per packet. I think an attacker will be able to deanonymize most flows long before they hit any practical storage limit.

Packet transforms are negotiated explicitly, so defining it early doesn't especially increase the likelihood that it will be broadly supported or used.

ietf-wg-masque / draft-ietf-masque-quic-proxy

Add packet transform that allows adding padding #92