klzgrad / naiveproxy

Make a fortune quietly
BSD 3-Clause "New" or "Revised" License
6.63k stars 882 forks source link

Traffic analysis of HTTP/2 CONNECT tunnels #1

Open klzgrad opened 6 years ago

klzgrad commented 6 years ago

As of version 64.x.y.z.

Negative means traffic from the client; positive means traffic from the server.

An example of tunneled TLS data:

20: IP
20: TCP
10: TCP Timestamps
 5: TLS header
24: TLS GCM mode overhead (Nonce + MAC)
---- (Encrypted data below)
 9: HTTP/2 frame header
 5: TLS header
24: TLS GCM mode overhead (Nonce + MAC)
---- (Tunneled payload below)

The lengths being counted here are the length of "Encryped Data" in the above diagram, because these lengths are cleartext and are independent from TCP segmentation. Cleartext TLS handshakes are not counted in the lengths.

Payload length -2000:2000:

-2000-2000

The largest spikes from server side are: 1024, 1179 (Google servers), 1389 (Cloudflare?), 1427/1429 (TCP MSS?). These should be various self-imposed MTU/MSS related optimizations.

Large spikes from the client are mostly TLS handshakes being tunneled in h2 DATA frames:

handshake

-526: padded ClientHello with session resumption. -267: some ECDH (pubkey len: 32) ClientKeyExchange + ChangeCipherSpec + 2x Encrypted Handshake Message. ? ~ -193: the bell curve covers unpadded ClientHellos with SNIs of various sizes. (-193 is the lower bound with an empty SNI.) -225: ChangeCipherSpec + 2x Encrypted Handshake Message. -135: some ECDH ClientKeyExchange (pubkey len: 65) + Encrypted Handshake Message. -102: some ECDH ClientKeyExchange (pubkey len: 32) + Encrypted Handshake Message.

klzgrad commented 6 years ago

Short lengths indicate h2 control frames.

close

Real h2: 48+: 200 HEADERS 40: Initial WINDOW_UPDATE + SETTINGS 17: PING 13: WINDOW_UPDATE, or RST_STREAM 9: empty DATA/SETTINGS -9: empty DATA/SETTINGS -13: WINDOW_UPDATE, or RST_STREAM -17: PING -64: Magic + SETTINGS + WINDOW_UPDATE -68: (some GET HEADERS of specific sites)

Tunnel (the tunnel itself): 78: 6x WINDOW_UPDATE 52: 4x WINDOW_UPDATE (HAProxy quirk) 39: 3x WINDOW_UPDATE (HAProxy quirk) 26: 2x WINDOW_UPDATE (HAProxy quirk) 18: 2x SETTINGS 9: SETTINGS -13: WINDOW_UPDATE

Tunnel (stream controls): 18: 2x END_STREAM DATA 10: 200 HEADERS (replying to CONNECT) 9: END_STREAM DATA -13: RST_STREAM -31: HEADERS (CONNECT (15) + authority (1) + proxy-authorization (1)) -55 ~ -31: CONNECT HEADERS with authority of various sizes

Tunnel (tunneled streams): 78: DATA header (9) + TLS overhead (5 + 24) + Initial WINDOW_UPDATE + SETTINGS (40) 55: DATA header(9) + TLS overhead(5 + 24) + PING(17) 47: DATA header(9) + TLS overhead(5 + 24) + empty DATA/SETTINGS(9) -47: DATA header(9) + TLS overhead(5 + 24) + empty DATA/SETTINGS(9) -55: DATA header(9) + TLS overhead(5 + 24) + PING(17) -60: DATA header(9) + ChangeCipherSpec(5 + 1) + EncryptedHandshakeMessage (5 + 40) -102: DATA header(9) + TLS overhead(5 + 24) + Magic + SETTINGS + WINDOW_UPDATE (64)

klzgrad commented 6 years ago

Real h2 inference

-64: The first packet from client side has a fixed size. (Magic + SETTINGS + WINDOW_UPDATE) 40: The first packet from serve side has a fixed size. (WINDOW_UPDATE + SETTINGS) 17: PING frames, browser idleness timeout setting, presence of a h2 connection +/- 9: Empty DATA, correlated to the number of streams +/- 13: WINDOW_UPDATE/RST_STREAM: a mix of WINDOW_UPDATE sending setting and occurrences of RSTs.

Tunnel inference

13*n: A lot of WINDOW_UPDATE, a dynamic signature of HAProxy (1.8.3) 9, 18: 1x, 2x empty DATA/SETTINGS, correlated to the number of tunneled connections (fix: Add paddings) -13: WINDOW_UPDATE/RST_STREAM, sending too many RSTs is a dynamic signature of Naive client (fix: Try to close connections with END_STREAM in Naive client; Add paddings for a part of it)

10: 200 HEADERS, too short (fix: Add paddings) -31, -51 ~ -31: CONNECT HEADERS (fix: Add paddings)

Can't really hide the stuff-in-TLS-in-h2 DATA overhead.

klzgrad commented 6 years ago

Some h2 traffic samples of accessing Alexa top 100 sites. cdf

Padding towards lengths in [200, 800] is probably ok.

klzgrad commented 4 years ago

Recent evaluations of website fingerprinting against multiplexed HTTP/2:

https://tools.ietf.org/html/draft-wood-pearg-website-fingerprinting-00 https://nikita.ca/papers/h2fp-madweb19.pdf https://dl.acm.org/doi/pdf/10.1145/3339252.3341478 https://dl.acm.org/doi/pdf/10.1145/3357384.3357993