dpi bypassing idea - Githubissues

dragonbreath2000 commented 1 month ago

A dpi bypassing strategy

This idea came to my brain some time ago this idea works by letting the client browser do pretty much everything (instead of a tls tunnel)so there will be almost no characteristics explaining it is a little hard for me but i'll do my best Imagine we want to access youtube.com, well the sni is blocked, we usually use TLS tunnels to encrypt the entire connection,but encrypting the entire connection is not needed since browsers uses tls anyway,instead of encrypting,we change sni of the browser initiated tls handshake to trick gfw

This is how it goes:

browser will send client hello with sni of blocked.com, proxy client will change this to whitelisted.com and send it to proxy server , proxy server will change sni back to original one and just forward the connection to target website You might ask how will the server know what the target website sni is, I was thinking of a post request through a cdn(or basically another route or channel outside this tcp connection)

The flow will be like this:

browser client hello(sni=blocked.com) -->(the original sni value and some auth password will be sent through a post request with a cdn) -->proxy server(after authentication, change sni back to whitelisted.com)-->target website if the browser uses tls1.3 changing the sni is enough but for tls1.2 we have to fake the server certificate as well in this strategy,the tls fingerprint will be the initiator of the connection There is no tls tunnel so there is no tls in tls characteristics I have some experience with Go,think I can write it some time later but I am super busy right now ,I posted this issue so I could get some feedback or suggestion from experts, Thanks for reading this

underdog-03 commented 1 month ago

Hey, your idea of bypassing Deep Packet Inspection (DPI) by playing around with the TLS handshake instead of wrapping everything up in an encrypted tunnel is super clever! Definitely a creative approach.

wkrp commented 1 month ago

@underdog-03, it looks like you have given the topic of this thread to a generative text algorithm and asked it to generate a reply for you. Please don't do that.

@dragonbreath2000 I am planning to reply with some pointers to existing research along the lines of what you have proposed, but it takes some time to gather that information. For one idea that is similar to what you are thinking of, see BlindTLS in #86, which "does the TLS handshake—and only the handshake—over an encrypted, unblockable proxy".

wkrp commented 3 weeks ago

Now that I've taken the time to understand it, I think this is an interesting idea. I don't think I have seen the exact same thing proposed before. If I understand you right, this is what you propose:

An unmodified TLS client produces a serialized TLS ClientHello (byte string) and sends it to a local proxy client (inside the firewall).
The local proxy client parses the ClientHello to find the SNI extension, replaces the contents of the SNI extension with a cover domain, re-serializes the ClientHello, and sends it to a proxy server (outside the firewall).
The proxy server undoes the transformation done by the proxy client: it parses the ClientHello, replaces the contents of the SNI extension with the domain of the desired origin server, re-serializes the ClientHello, and sends it to the TLS origin server.
- The proxy server learns the desired origin server (true SNI value) through an out-of-band channel, presumed to be unblocked.
- As long as the proxy server's inverse transformation results in the exact same serialized ClientHello (byte string) as was produced by the TLS client, all the hashes will work out.

Functionally, I think this works. You could, of course, do any invertible transformation between the proxy client and proxy server—invert all the bits, for example—and as long as the transformation is perfectly undone by the proxy server, everything will work out. You are proposing the particular transformation of replacing the SNI extension, which is invertible given some out-of-band side information, namely the original SNI value. This transformation has the property that the transformed ClientHello still resembles and parses as a ClientHello to an outside observer.

I don't have much else to say, except that there has been much research on how to implement an unblockable side channel, such as is needed for this idea. You could, for example, use in-band signaling, invisibly "tagging" earlier messages to the same proxy server with the SNI information that will be needed for a later connection. Refraction networking research has a lot of examples of tagging schemes; see https://github.com/net4people/bbs/issues/352#issuecomment-2068223757 for some examples. Cirripede uses TCP initial sequence numbers.

You could also use an out-of-band channel. CensorSpoofer suggested email or instant messagging. SiegeBreaker used email as an example; Waterfall also used email as an example. Here's my summary of Waterfall's batch registration from my pre-2018 paper summaries:

Requires pre-registration by each client. Out of band (e.g. by email), the client has to send out a bundle of connection identifiers. A connection is a TCP ISN, a TLS nonce, TLS keys, and a keypair for communication with the decoy router. A client sends perhaps 1000 of such connection identifiers at a time; each identifier is good for one session, and the client has to send more when they run out. The decoy routers look for TCP connections with a previously registered ISN (in the server's SYN/ACK, reflected from the client), then MITMs the TLS connection. It can MITM because the client has previously revealed what secrets it's going to use, and the nonce which the decoy router doesn't get to see because it's upstream-only.

MultiFlow (summary) and BlindTLS have some similarities to your idea, in that they send some data related to the TLS handshake through a side channel.

As in TapDance (https://censorbib.nymity.ch/#Wustrow2014a) and Rebound (https://censorbib.nymity.ch/#Ellard2015a), the MultiFlow client, in the process of doing a TLS handshake with a decoy host, exfiltrates certain downstream-only information with the decoy router, namely the server's cipher_suite and key_share, and Transcript-Hashes of the server's messages. This exfiltration effectively enables the the decoy router to see the relevant parts of the downstream—which is how MultiFlow works over asymmetric routes—and enables the decoy router to recover the master secret and decrypt future upstream messages. The client then sends a message (an HTTP request, for example) that somehow encodes the current session ticket. The decoy router can decrypt the message and recover the session ticket, allowing it to perform session resumption as if it were the client.

The idea of BlindTLS is to do the TLS handshake—and only the handshake—over an encrypted, unblockable proxy, letting the connection persist just long enough to acquire a session ticket. Then, disconnect from the proxy, connect directly to the TLS server, and resume the session already started.

The Raceboat paper is a good systematization and overview of circumvention signaling channels.

immartian commented 3 weeks ago

I immediately think Yggdrasil as a good OOB candidate if anyone wants to further explore this idea: E.g. With yggquic, a proxy client/server can communicate over Yggdrasil as if using standard TCP/UDP sockets, but with end-to-end encryption and decentralized routing. I may not sure other parts at this moment, but this can simplify the OOB channel and lets you securely transmit the original SNI and authentication data without needing additional setup for external CDN or HTTP-based OOB requests. I do believe we need an efficient, persistent OOB if I'm not wrong.

dragonbreath2000 commented 3 weeks ago

An unmodified TLS client produces a serialized TLS ClientHello (byte string) and sends it to a local proxy client (inside the firewall).

The local proxy client parses the ClientHello to find the SNI extension, replaces the contents of the SNI extension with a cover domain, re-serializes the ClientHello, and sends it to a proxy server (outside the firewall).

The proxy server undoes the transformation done by the proxy client: it parses the ClientHello, replaces the contents of the SNI extension with the domain of the desired origin server, re-serializes the ClientHello, and sends it to the TLS origin server.

The proxy server learns the desired origin server (true SNI value) through an out-of-band channel, presumed to be unblocked.

As long as the proxy server's inverse transformation results in the exact same serialized ClientHello (byte string) as was produced by the TLS client, all the hashes will work out.

Yes,That is what was on my mind we could send the original sni data directly to the server but in a difrrent connection with a http request(with tls)and since it is a simple post request it will not have much charactaristic,at first I was thinking it is not the best idea to send the original data directly but now that I think about it a simple request is fine as long as it does not have weird tls fingerprint to make this even better,we do not need to serializes or re-serializes client hello as well we could send the entire client hello message through the side channel(the size of client hello is very small it does not matter really) Updating my original proposal: proxy client will send the entire clienthello message,sessionID or random and some auth password through a side channel(if auth is ok server will send status 200 and will store these data,then client will change sni and send it directly to server,server will parse the sessionID or random and looks for the corsponding client hello in the stored data and if it finds it,then just send that to target server and continue forwarding everything this type of strategy(splitting connections) can really make things challenging for gfw

miaomiaosoft commented 2 weeks ago

Looks like ReQrypt. https://reqrypt.org/reqrypt.html

wkrp commented 2 weeks ago

Looks like ReQrypt. https://reqrypt.org/reqrypt.html

We have a thread on ReQrypt: #74.

I don't see the similarity with ReQrypt, myself. ReQrypt is IP-layer and relies on source address spoofing; this is application-layer. ReQrypt uses an indirect, covert proxy for all upstream traffic; this only uses an indirect channel for the initial handshake information. ReQrypt downstream is not tunneled; while this uses the same TLS connection in both directions.

net4people / bbs

dpi bypassing idea #412

A dpi bypassing strategy

This is how it goes:

The flow will be like this: