net4people / bbs

Forum for discussing Internet censorship circumvention
3.38k stars 80 forks source link

Rosen: censorship-resistant proxy tunnel based on encapsulating traffic within cover protocols #57

Open awnumar opened 3 years ago

awnumar commented 3 years ago

As part of my ongoing masters thesis I've been working on Rosen: https://github.com/awnumar/rosen

Related post here: https://spacetime.dev/rosen-censorship-resistant-proxy-tunnel

ewust commented 3 years ago

Thanks for the hard work!

Just want to point out a clarification: uTLS doesn't require you to run/include a headless chrome, it's just a fork of the Go tls library. The idea is to let you specify/mimic the client hello of popular implementations (like Chrome or Firefox), so censors can't block you based on your unique-looking client hello "fingerprint". If you want to see what your tool's fingerprint(s) currently are, you can upload a pcap here: https://tlsfingerprint.io/pcap and it will give you a list and compare those to our dataset of captured TLS fingerprints in live traffic.

Also in case you are not aware, Sergey Frolov and I published a paper at FOCI this year that had a similar design to yours, and also supported Websockets (primarily to cut down on tunneling overhead): https://www.usenix.org/conference/foci20/presentation/frolov

A prototype of the HTTPT tool is available here: https://github.com/sergeyfrolov/httpt

However, there remains a lot of work to be done in figuring out what counts as "cover" traffic for a server. How do we avoid a censor being able to tell that serving an assets folder is a tell-tale sign of these kinds of proxies? We had some preliminary ideas in the HTTPT paper, but many were not implemented or tested against real censors. It would be interesting to have a suite of these (500s, 404s, asset folders, mimicking popular sites, etc) and see which ones censors are able to block.

ghost commented 3 years ago

How do we avoid a censor being able to tell that serving an assets folder is a tell-tale sign of these kinds of proxies?

v2ray didn't do anything about it, and let user choose what to serve by put it behind a real web server. By doing so, there's no auto generated assets and common pattern to detect. AFAIK, it works pretty well under real censors.

wkrp commented 3 years ago

This is great, thanks for working on Rosen.

  • Currently only supports HTTPS with automated certificate issuance and renewal, but written with a modular architecture to make it easy to add support for other cover protocols. This kind of flexible approach is suggested in this paper.

I see you already know about HTTPT, which has a similar design and goals. Here are a few other projects to look at for comparison or inspiration:

A modular architecture has been thought of and implemented many times. V2Fly may be the leader here. Pluggable transports can be thought of as adding a layer of modularity over dedicated transport programs. obfs4proxy is internally modular and supports a number of transports; currently in Tor Browser both obfs4 and meek are done by obfs4proxy.

  • Based on this paper, the proxy is strictly tunneling-based. Currently there is no protection against TLS fingerprinting so the client and server will be detected as belonging to the Go standard library. However I was made aware of this library for lower level access to the ClientHello structure, but I'm not sure what the implications of using this are on unobservability; will a fingerprint that does not match some other characteristics of the traffic then flag to a censor that something atypical is going on? I'm also not sure if interacting with headless chrome instead of a Go standard library client is worth it.

This may just be a matter of terminology, but I wouldn't call what Rosen is doing "tunneling" except insofar as it resembles Go crypto/tls. For tunneling, you would have to use an actual instance of some common HTTPS implementation; a headless browser would be one way of doing that. But it could be even more specific; e.g., using a browser through a specific web service.

uTLS is a good option; as @ewust notes, it's a modification of crypto/tls and no browser is involved. For what it's worth, the meek that's deployed in Tor Browser has used uTLS since October 2019 and in that time it has not been blocked by its TLS fingerprint, as far as I know. Before switching to uTLS, the meek deployment in Tor Browser used a headless Firefox, which worked okay but was logistically hard to work with and keep up to date; see Section V of "The use of TLS in censorship circumvention" for more discussion.

You need to think about the server-side TLS fingerprint as well, and uTLS does not help there. One option is to use a frontend web server such as Apache or Nginx, with its own TLS certificate, that forwards everything to a local Rosen server. HTTPT and V2Fly are designed with this kind of deployment in mind.

  • In this paper, the researchers detected meek based on its timing patterns. There's no protection against this currently, but I am considering attempting to upgrade the HTTPS connection to a WebSocket connection to avoid the very atypical request-response timing pattern. Also a future release will add random-length padding to the payload.

This is a good idea. To my knowledge, there is still scant evidence for censors actually using attacks based on timing and packet sizes—the closest thing is probably the GFW's considering the length of the first data packet (Section 4) in detecting possible Shadowsocks connections—but it is a good thing to prepare for in the future. We have talked about padding schemes before at https://github.com/net4people/bbs/issues/9#issuecomment-524095186. Harder than designing a padding scheme, though, is deciding how to apply it—characterization of "normal" traffic patterns is IMO still an open and vital research question. Snowflake uses this protocol for padding and dnstt uses this one, but they are currently unused (inserting either no or a fixed amount of padding).

  • Despite the shortcomings, I have tested the tunnel against nDPI and a commercial DPI engine: both detected it as HTTPS. A few generous people have tested it out for me behind the GFW as well, and they noted that it was extremely fast as compared to other proxies. Unfortunately there's a bug at the moment causing crashes but I am working on a patch.

It's great that you have been able to test against actual DPI engines. Bear in mind that on-path DPI is, empirically, not the favored tool of censors; when possible they prefer to use "setup" features rather than "usage" features (Recommendation 3 on page 11). This means that protecting the IP address or domain name of the proxy is as important as having a good protocol fingerprint. The challenge here is informing censored users of where the proxies are located, without also informing the censor. One way to deal with this is to ask each user to set up their own personal proxy.

alexzhang2015 commented 3 years ago

^_^

awnumar commented 3 years ago

@ewust

Just want to point out a clarification: uTLS doesn't require you to run/include a headless chrome, it's just a fork of the Go tls library. The idea is to let you specify/mimic the client hello of popular implementations (like Chrome or Firefox), so censors can't block you based on your unique-looking client hello "fingerprint". If you want to see what your tool's fingerprint(s) currently are, you can upload a pcap here: https://tlsfingerprint.io/pcap and it will give you a list and compare those to our dataset of captured TLS fingerprints in live traffic.

I'm aware that uTLS is just a fork of crypto/tls, I was wondering if it was good enough on its own to withstand TLS client fingerprinting attacks or whether something like headless chrome is neccessary. Using a browser implementation as a tunnel adds a lot in terms of complexity and overhead so I'm hesitant to add it if it's not needed. I see that uTLS uses random fingerprints too which makes me question whether they could be detected since they're atypical: a nonexistent fingerprint is also a fingerprint. I also wonder if the Go standard library's fingerprint itself is rare enough to warrant this level of concern, even a small rate of "collateral damage" from blocking Go's fingerprint may still make it infeasible to block.

Thanks for the link to tlsfingerprint.io, it will definitely be useful.

Also in case you are not aware, Sergey Frolov and I published a paper at FOCI this year that had a similar design to yours, and also supported Websockets (primarily to cut down on tunneling overhead): https://www.usenix.org/conference/foci20/presentation/frolov

A prototype of the HTTPT tool is available here: https://github.com/sergeyfrolov/httpt

Yes I came across HTTPT, it's interesting. The approach you ended up with is similar to the one used in Rosen, except a typical HTTP handler is implemented instead of using nginx or something else as a reverse proxy. This latter approach could easily be implemented in Rosen by starting the server on a different port (without HTTPS) and putting any kind of reverse proxy in front of it. Using Cloudflare is another option.

You say you used WebSockets to cut down on overhead. Did you see a significant improvement in performance? Rosen's performance seems to be mostly network limited for me so the main reason I was looking to use WebSockets is for a less suspicious traffic pattern.

However, there remains a lot of work to be done in figuring out what counts as "cover" traffic for a server. How do we avoid a censor being able to tell that serving an assets folder is a tell-tale sign of these kinds of proxies? We had some preliminary ideas in the HTTPT paper, but many were not implemented or tested against real censors. It would be interesting to have a suite of these (500s, 404s, asset folders, mimicking popular sites, etc) and see which ones censors are able to block.

I've thought about this too. Rosen currently implements the approach that @studentmain mentions: we have a hot-swappable local public folder that is served when the correct password is not supplied in the header. Since any user can replace it with their own, there shouldn't be a blanket way of detecting a proxy server, but the reality is that most users will not do this. For this reason, perhaps an error page or something else can be used. In Go, a HTTP request handler has the form:

func staticHandler(w http.ResponseWriter, r *http.Request) {
    staticWebsiteHandler.ServeHTTP(w, r)
}

So this static "default" handler can be swapped out with any function that takes a request object and a response writer.

@wkrp Thanks for the links.

uTLS is a good option; as @ewust notes, it's a modification of crypto/tls and no browser is involved. For what it's worth, the meek that's deployed in Tor Browser has used uTLS since October 2019 and in that time it has not been blocked by its TLS fingerprint, as far as I know. Before switching to uTLS, the meek deployment in Tor Browser used a headless Firefox, which worked okay but was logistically hard to work with and keep up to date; see Section V of "The use of TLS in censorship circumvention" for more discussion.

You need to think about the server-side TLS fingerprint as well, and uTLS does not help there. One option is to use a frontend web server such as Apache or Nginx, with its own TLS certificate, that forwards everything to a local Rosen server. HTTPT and V2Fly are designed with this kind of deployment in mind.

Again I wonder how rare Go's default fingerprint is actually is and whether this is neccessary. uTLS does look quite simple in terms of its API though and flexible so it seems like there's not many downsides to including it. Since meek has used it with success, this is more reason to include it. In terms of using another frontend, I would say the same as what I replied to @ewust.

This is a good idea. To my knowledge, there is still scant evidence for censors actually using attacks based on timing and packet sizes—the closest thing is probably the GFW's considering the length of the first data packet (Section 4) in detecting possible Shadowsocks connections—but it is a good thing to prepare for in the future. We have talked about padding schemes before at #9 (comment). Harder than designing a padding scheme, though, is deciding how to apply it—characterization of "normal" traffic patterns is IMO still an open and vital research question. Snowflake uses this protocol for padding and dnstt uses this one, but they are currently unused (inserting either no or a fixed amount of padding).

Censors not using packet sizes and timing patterns is what I expected, it likely requires more computational resources than is feasible in real time. I have been told that the GFW activates a more restrictive mode during sensitive times so perhaps in these situations it could be detected.

To be honest, (except for meek) shadowsocks was the only software I looked at deeply while designing and implementing Rosen. I was kind of shocked by its poor cryptographic design and overall symplicity, so since it works well I'm more confident about Rosen and these other projects you mentioned.

In terms of random padding schemes, my goal is just to destroy characteristics of Rosen's own fingerprint. For example, if there's no communication happening in the HTTP tunnel, there will be fixed-size, small, pings. Appending padding between 1-4KB randomly would destroy this fingerprint with little overhead.

Of course making traffic look like existing services like Netflix or YouTube could be feasible but censors could tell we're not connecting to Netflix's servers for example. Also these services are dominated by downloads, restricting us quite a bit.

It's great that you have been able to test against actual DPI engines. Bear in mind that on-path DPI is, empirically, not the favored tool of censors; when possible they prefer to use "setup" features rather than "usage" features (Recommendation 3 on page 11). This means that protecting the IP address or domain name of the proxy is as important as having a good protocol fingerprint. The challenge here is informing censored users of where the proxies are located, without also informing the censor. One way to deal with this is to ask each user to set up their own personal proxy.

My intuition tells me you are right. Endpoint-fingerprinting resistance is very important as it could render all of our effort in other areas useless. The distribution problem is hard to solve, yes. My view is that users have to setup their own services or pay for someone to do this. Putting servers on CDNs like Amazon's or behind Cloudflare, and putting them behind their own domains, seems like the most resilient method.

ghost commented 3 years ago

Early history of Shadowsocks is interesting. At first (2010) it uses substitution cipher. Soon they found substitution cipher is not secure, frequency analysis is enough to detect it.

Then shadowsocks introduced RC4 cipher. This time, they forgot add an initialization vector.

RC4 itself has no place for IV, so they generate an IV, mix it with password by MD5. This is the first stream cipher of shadowsocks. Then they finally imported (#included<>, [DllImported], depends on which language is used by the implementation...) OpenSSL, and they added those aes/camellia-xxx-yyy stream cipher from OpenSSL. Then you maybe already know other part.

You see, things are not related at cryptography at the very begin.

ghost commented 3 years ago

And why it works? Just like DES works in 1980, shadowsocks works in 2010. Every new protocol is "Unknown traffic" when it's born. And they usually decide allow unknown traffic. Sure, we know it is shadowsocks in a few days/months/years, shadowsocks not works now, just like DES not works in 199x. Then shadowsock changed it's protocol by replace it's S-Box with RC4, works for a while, just like 3DES works for a while, maybe getting detected. Then shadowsocks added an IV, works for a while, getting active probed in 2014-2016. They choose to add a MAC, they experimented their handcrafted OTA, not works well, so they choosed update protocol to use a real AEAD cipher. It works until 2019, then getting active probed in another way. And they fixed it by improve error handling. Why shadowsocks works? Because it's active maintained. Original shadowsocks protocol won't works, but 2020 protocol and implementation will works at least for a while. If you changed fast enough, DPI won't catch you.