Split mode correlation attacks

pkelsey commented 2 years ago

(Originating at #509)

There appears to be nothing in the document concerning an attacker with a presence on both side of the client-facing server in split mode defeating ECH privacy by correlating connections across the client-facing server.

Consider an attacker present on both sides of a client-facing server whose goal is to uncover which backend server(s) specific ECH-using clients are accessing. By virtue of its position in the network, this attacker:

Can, through passive observation of the traffic on the client side of the client-facing server, maintain a list of ECH connections initiated by each client identity of interest to the client-facing server (client identity based on the available information: transport addressing, ClientHelloOuter parameters that are in the plain, protocol timing...).
Can, through passive observation of the traffic on the backend-server side of the client-facing server, maintain a list of ECH connections initiated from the client-facing server to the backend servers along with their corresponding SNI values, as at this point ClientHello is ClientHelloInner in the plain.
Can uncover the SNI value being used by a particular client connection in list (1) if it can match it to its corresponding connection in list (2).
Does not necessarily have to perform this matching in any particular temporal relation to the evolution of the connection in order to achieve its goal.

Attacks:

Passive flow tagging using time dilation: At one position, the attacker chooses a target ECH connection and introduces a time dilation between two events (by delaying packets) whose relative timing otherwise has statistics stable enough to reliably detect the injected outlier at the attacker's other position. The attacker annotates the connection in the list on the side the dilation is inserted with the time of the operation, and annotates the connection list at its other position with the time that outlier timing for that event was detected. Later, connections are matched when recorded time values for a pair of entries (one from each connection list) are suitably related (the arithmetic left as an exercise for the reader). One event pair might be observed ClientHello -> ServerHello timing. Or it could be something more opportunistic, such as the arrival time of two successive records in a large backend server initial flight. The attacker may be able to enhance the capacity of this attack by using multiple distinguishable dilation values, which would allow it to be applied to multiple connections within the same time window.
Non-destructive active flow tagging I: On the client side, the attacker picks a target ECH connection from the client and inserts a change_cipher_spec record consisting of the single byte value 0x1 following ClientHello, and annotates the connection in the client-side list with the time at which the operation was performed. On the backend-server side, the attacker annotates the entry in its list for the connection on which such a change_cipher_spec is observed with the time at which it is observed. The attacker controls the interval at which it employs the attack such that given the path delay characteristics and clock synchronization limits, it can successfully match the annotated list entries. The attacker can enhance the capacity of this attack by employing multiple differentiable time delay values between the ClientHello and the change_cipher_spec it inserts, enabling its use on multiple connections within the same time window. Such change_cipher_spec records may also be employed by the client for their intended purpose, but the attacker can likely filter them out to protect the fidelity of its attack (and if necessary, unconditionally reinsert them at its position on the other side), as it is unlikely that the client-facing server and the infrastructure behind it require them for correct operation. Deployed TLS 1.3 implementations that tolerate multiple such change_cipher_spec records would provide an expanded capacity to the attacker as being able to insert a variable number of records would further expand the concurrency space. This attack could also be run in the other direction. This attack only requires an active position on one side.
Non-destructive active flow tagging II: On any given ECH connection on the client side, after the client-facing server enters forwarding mode, the attacker inserts a fake record whose payload contains a tag identifying it to its other position as such a record as well as indicating an attacker-assigned connection ID. The client-facing server dutifully forwards the fake record, after which the attacker receives it at its backend-server side position, identifies the tag in the payload, removes the record from the flow, and annotates the associated entry in its connection list with the connection ID. This could also be implemented as a length extension of otherwise legitimate records, with the additional payload removed at the second position. This attack could also be run in the other direction. This attack requires an active position on both sides.
Destructive active flow tagging: Once the client-facing server has entered forwarding mode for a given ECH connection, the attacker can overwrite a portion of the payload of an existing record with the tag described in (3), with the same processing of the tag content at the attacker's other position. As this will cause the connection to be aborted, it would only be useful on a sampling basis. The attacker may be able to reduce the observability of this attack by opportunistically applying it to alerts it detects through traffic analysis. This attack could be applied in either direction. This attack only requires an active position on one side.

davidben commented 2 years ago

I don't think (2-4) need to be quite so complicated. If you're assuming that the client-facing server <-> split mode channel is both unencrypted and visible to the attacker, they can just use the fact that every unmodified bit of ciphertext will identify the connection.

I think, for purposes of split mode, we have to assume that the attacker cannot observe traffic between the client-facing and backend server, either due to visibility (in the shared mode cases, this "traffic" doesn't even go over the network) or due to them having their own encrypted channel. Although the latter is still a little fuzzy due to timing, depending on how much traffic goes into the client-facing server.

sftcd commented 2 years ago

On 06/08/2021 19:55, David Benjamin wrote:

I don't think (2-4) need to be quite so complicated. If you're assuming that the client-facing server <-> split mode channel is both unencrypted and visible to the attacker, they can just use the fact that every unmodified bit of ciphertext will identify the connection.

I think, for purposes of split mode, we have to assume that the attacker cannot observe traffic between the client-facing and backend server, either due to visibility (in the shared mode cases, this "traffic" doesn't even go over the network) or due to them having their own encrypted channel. Although the latter is still a little fuzzy due to timing, depending on how much traffic goes into the client-facing server.

I agree. I forget if we already note the basic vulnerability or not (we should I guess) but HOWTO secure client-facing to backend traffic is properly work for another day I think. It does need a pile of work but better that we get started on client to client-facing server experiments first.

S.

pkelsey commented 2 years ago

I don't think (2-4) need to be quite so complicated. If you're assuming that the client-facing server <-> split mode channel is both unencrypted and visible to the attacker, they can just use the fact that every unmodified bit of ciphertext will identify the connection.

I think, for purposes of split mode, we have to assume that the attacker cannot observe traffic between the client-facing and backend server, either due to visibility (in the shared mode cases, this "traffic" doesn't even go over the network) or due to them having their own encrypted channel. Although the latter is still a little fuzzy due to timing, depending on how much traffic goes into the client-facing server.

The assumption currently is that the backend channel in split mode is both unencrypted and visible to the attacker, as that's how the document currently reads, and it was expressed to me in #509 that the security model includes the attacker being located there. (2-4) are the product of going down a rabbit hole of obtaining purely deterministic results. I agree that correlation via sampling not-overly-many bytes of ciphertext at the same semantic position in the exchanges seen on both sides would probably be considered by most interested parties to be deterministic-enough.

I do think it would be helpful to update the document to note that it currently assumes the attacker has no visibility to the client-facing server <-> backend server traffic in split mode.

Regarding your comment above about dependence of security properties in split mode on traffic load at the client-facing server, for the benefit of those evaluating whether they can rely on ECH in their circumstances, I think the document should be clear as to whether there are scenarios where the assurance of privacy depends on being part of a big enough school of fish.

chris-wood commented 2 years ago

The assumption currently is that the backend channel in split mode is both unencrypted and visible to the attacker, as that's how the document currently reads, and it was expressed to me in #509 that the security model includes the attacker being located there.

Sorry, to clarify, the assumption is that the attacker is present on that link, but cannot read data on that link, perhaps because it's encrypted. How that boundary is maintained is a deployment consideration specific to split mode. (As @davidben points out, split doesn't really work at all if the attacker has plaintext access to this link.)

I do think it would be helpful to update the document to note that it currently assumes the attacker has no visibility to the client-facing server <-> backend server traffic in split mode.

This would be fine, but it is different from saying the attacker is not present there. Any deployment of split mode can't just assume the attacker isn't there. Rather, it should assume the attacker is there, and make sure the client-facing<>backend communication is protected accordingly. (I hope that clarifies my mental model.)

pkelsey commented 2 years ago

Sorry, to clarify, the assumption is that the attacker is present on that link, but cannot read data on that link, perhaps because it's encrypted. How that boundary is maintained is a deployment consideration specific to split mode. (As @davidben points out, split doesn't really work at all if the attacker has plaintext access to this link.)

I do think it would be helpful to update the document to note that it currently assumes the attacker has no visibility to the client-facing server <-> backend server traffic in split mode.

This would be fine, but it is different from saying the attacker is not present there. Any deployment of split mode can't just assume the attacker isn't there. Rather, it should assume the attacker is there, and make sure the client-facing<>backend communication is protected accordingly. (I hope that clarifies my mental model.)

Appreciate the clarification on the thinking here. I agree there is an important difference between not-present and no-visibility, and I'm not advocating for an assumption that the attacker is not on the backend link - #509 was language to highlight the state of the model as currently written.

chris-wood commented 2 years ago

Appreciate the clarification on the thinking here. I agree there is an important difference between not-present and no-visibility, and I'm not advocating for an assumption that the attacker is not on the backend link - #509 was language to highlight the state of the model as currently written.

Understood! We could reopen #509 and rephrase it slightly to highlight this difference, and then perhaps say that it can be realized either (1) with encryption, assuming the attacker is everywhere, or (2) by changing network topology such that the attacker is really only present on the client<>client-facing path, as noted by @ekr.

@pkelsey, what do you think? Would you be willing to refactor that PR to match?

pkelsey commented 1 year ago

@chris-wood I'd be happy to reopen #509 with new tweaks to the language, but I don't think it's yet clear what those tweaks should be. If we pursue (1), it seems something more would need to be said than simply that the link between the client facing server and the backend server achieves "no-visibilty" via encryption, at the very least to head off interpretation of "via encryption" to mean direct forwarding of the ciphertexts. Is (2) really on the table? I'm not sure what the context of the reference to "as noted by @ekr" as I'm not aware of where that note was made, so I'd need some help filling in the blank if there was more to it.

Aside from figuring out the above, I think at this point there is one clear residue of this discussion, which is that client facing servers operating in split mode should be required to drop all RFC 8446 middlebox compatibility change_cipher_spec messages received from the client in order to deprive active attackers of this easy-access connection labeling tool. Such messages should have no reason to exist on the link between a client facing server and a backend server. I haven't yet attempted a full survey of extant implementations, but it does appear OpenSSL, in a TLS 1.3 handshake, will ignore up to 32 of them appearing consecutively between handshake records, which looks like quite a bit of leeway for exploitation.

davidben commented 1 year ago

Can you please elaborate on this CCS-based attack? Are you envisioning that the communication channel between frontend and backend would somehow be protected but still reveal the record type?

pkelsey commented 1 year ago

I don't think the record type necessarily has to be revealed on the backend link for this to be a concern, as this is a tool for an active attacker to adjust the size of data forwarded between frontend and backend in a connection-specific way that otherwise has no effect on the evolution of the targeted connection(s).

tlswg / draft-ietf-tls-esni

Split mode correlation attacks #513