net4people / bbs

Forum for discussing Internet censorship circumvention
3.24k stars 76 forks source link

How China Detects and Blocks Shadowsocks #22

Open gfw-report opened 4 years ago

gfw-report commented 4 years ago

How China Detects and Blocks Shadowsocks

Authors: Anonymous, Anonymous, Anonymous, David Fifield, Amir Houmansadr

Date: Sunday, December 29, 2019

中文版:Shadowsocks是如何被检测和封锁的

This report first appeared on GFW Report. We also maintain an up-to-date copy of the report on both net4people and ntc.party.


Shadowsocks is one of the most popular circumvention tools in China. Since May 2019, there have been numerous anecdotal reports of the blocking of Shadowsocks from Chinese users. This report contains preliminary results of research into how the Great Firewall of China (GFW) detects and blocks Shadowsocks and its variants. Using measurement experiments, we find that the GFW passively monitors the network for suspicious connections that may be Shadowsocks, then actively probes the corresponding servers to test whether its guess is correct. The blocking of Shadowsocks is likely controlled by human factors that increase the severity of blocking during politically sensitive times. We suggest a workaround—changing the sizes of network packets during the Shadowsocks handshake—that (for now) effectively mitigates active probing of Shadowsocks servers. We will continue collaborating with developers to make Shadowsocks and related tools more resistant to blocking.

Main Findings

How Do We Know This?

We set up our own Shadowsocks servers and connected to them from inside China, while capturing traffic on both sides for analysis. All experiments were conducted between July 5, 2019 and November 11, 2019. Most of the experiments were conducted since the reported large-scale blocking of Shadowsocks starting September 16, 2019.

In most of the experiments, we used shadowsocks-libev v3.3.1 as both client and server, since it is an actively maintained and representative Shadowsocks implementation. We believe the vulnerabilities we discovered applies to many Shadowsocks implementations and its variants, including OutlineVPN.

Unless explicitly specified, all clients and servers were used without any modification to their network functions, for example firewall rules. Shadowsocks can be configured with different encryption settings. We tested servers running both Stream ciphers and AEAD ciphers.

Details About Active Probes

Shadowsocks is an encrypted protocol, designed not to have any static patterns in packet contents. It has two main operating modes, both keyed by a master password: Stream (deprecated) and AEAD (recommended). Both modes are meant to require the client to know the master password before using the server; however in Stream mode the client is only weakly authenticated. Both modes are susceptible to replay of previously seen authenticated packets, unless separate measures to prevent replay are taken.

Probe payload types and censors' intentions

We have observed 5 types of active probes:

Replay based:

  1. Identical replay (of the first data-carrying packets in a legitimate connection);
  2. Replay with byte 0 changed;
  3. Replay with bytes 0–7 and 62–63 changed;

Seemingly random (not a replay of any genuine connection that we can identify):

  1. Probes of length 7–50 bytes, accounting for around 70% of the random probes;
  2. Probes of length exactly 221 bytes, accounting for around 30% of the random probes.

CDF: Payload Lengths of PSH/ACKs Received by Outline Server

We suspect that the active probing system identifies Shadowsocks servers and its variants by comparing a server’s responses to several of these probes.

Shadowsocks-libev has a replay filter; however most other Shadowsocks implementations do not. The replay filter blocks only exact replay, not replay that has been modified, and is not by itself enough to prevent active probing from comparing the responses to several slightly different probes.

How many connections are required to trigger active probing?

It appears that a certain threshold of genuine simultaneous connections are required to trigger active probing. For example, in one experiment, as few as 13 connections were enough to trigger the active probing. Initial result also shows it may require a slightly more connections for the Shadowsocks servers using AEAD ciphers to get probed.

Relationship between genuine connections and active probings

We let a client make 16 connections to a Shadowsocks server every 5 minutes. Although our connections triggered a large number of active probes, the Shadowsocks server was never blocked, for reasons we do not fully understand.

Number of SYN Received Across Time

The figure above shows that while legitimate clients attempt to connect to the server, it receives active probes; and when they stop trying to connect, the active probing mostly stops. The number of active probes sent per legitimate connection is variable and not 1:1.

Delay of replay attacks

The active probing system may save a genuine connection payload and replay it later, even in response to a separate, future connection. The figure below shows the variability of the delay between legitimate connections and the ensuing replay-based probes. Because one legitimate connection may cause many (up to 47 in one case) replay attacks, we present two different cases: the orange line is samples only the first replay-based probe for a particular legitimate connection; the blue line is samples all replay-based probes.

The result shows that more than 90% of the replayed probes were sent within an hour of the connection from the legitimate client. The minimum observed delay was 0.4 seconds, while the maximum was around 400 hours.

CDF: Delay of Replay-based Probes

Origin of the probes

Throughout all the experiments we conducted so far, we have seen 35,477 active probes sent from 10,547 unique IP addresses which all belong to China.

Origin ASes. The two autonomous systems that account for most of the Shadowsocks probes, AS 4837 (CHINA169-BACKBONE CNCGROUP China169 Backbone,CN) and AS 4134 (CHINANET-BACKBONE No.31,Jin-rong Street, CN), are the same as have been documented in previous work.

ASN of unique probing IPs throughout all experiments

Centralized Structures. Despite coming from thousands of unique IP addresses, it appears that all active probing behavior is centrally managed by only a small number of processes. The evidence for this observation comes from network side channels. The figure below shows the TCP timestamp value that is attached to the SYN segment of each probe. The TCP timestamp is a 32-bit counter that increases at a fixed rate. It is not an absolute timestamp, but is relative to however the TCP implementation was initialized when the operating system last booted. The figure shows that what at first seem to be thousands of independent probers actually share only a small number of linear TCP timestamp sequences. In this case there are at least nine different physical systems or processes, with one of the nine accounting for the great majority of probes. We say “at least” nine process because we can probably not distinguish two or more independent processes sharing a very close interception value. The slopes of the sequences represent a timestamp increment frequency of 250 Hz.

TCP TSval of SYN Segments from Probers

How Can We Circumvent the Blocking?

Detection of Shadowsocks proceeds in two steps:

  1. Passive identification of suspected Shadowsocks connections.
  2. Actively probing the server of suspected connections.

Therefore, to avoid blocking, you can (1) evade the passive detector, or (2) respond to active probes in a way that does not result in blocking. We will show how to do (1) by installing software that alters the sizes of packets.

Brdgrd is software that you can run on a Shadowsocks server that causes the client to break its Shadowsocks handshake into smaller packets. It was originally intended to disrupt the detection of Tor relays by forcing the GFW to do complicated TCP reassembly, but here we take advantage of brdgrd’s shaping of packet sizes from client to server. It seems that the GFW at least partially relies on packet sizes to passively detect Shadowsocks connections. Modifying packet sizes can significantly mitigate active probing by disrupting the first step in classification.

Effectiveness of brdgrd on Server

The figure shows a Shadowsocks server undergoing active probing, and then the probing going to zero within several hours of brdgrd being activated. As soon as we disabled brdgrd, active probing resumed. The second time we enabled brdgrd, the probes completely stopped for around 40 hours, but then a few more probes came.

Another experiment shows that brdgrd may be even more effective if used from the very beginning, before the server has been probed for the first time.

Brdgrd works by rewriting the server’s TCP window size to a rarely small value. Therefore it is likely possible to detect that brdgrd is being used. So while brdgrd can effectively reduce active probing for the time being, it cannot be regarded as a permanent solution to Shadowsocks blocking.

Unresolved Questions

While the fact that active probing happens is clear, it is still unclear to us how active probing affects the blocking of Shadowsocks servers. That is, we have 33 Shadowsocks servers located all over the world. While most of them experienced heavy active probing, only 3 of them were ever blocked. More interestingly, one of the servers that was blocked was used for only a very short period of time, and thus had not received as many probes as some other servers that did not get blocked.

We came up with three hypotheses, attempting to explain this interesting phenomenon:

Thanks

We want to thank these people for research and helpful discussion on this topic:

Contacts

We encourage you to share your questions, comments or evidence on our findings and hypotheses publicly or privately. Our private contact information can be found at the footer of GFW Report.

wkrp commented 4 years ago

If you run a Shadowsocks server, you can check for evidence of active probing in the log. In the shadowsocks-libev implementation, look for log messages like this:

crypto: stream: repeat IV detected
ERROR: failed to handshake with X.X.X.X: invalid address type
crypto: AEAD: repeat salt detected
ERROR: failed to handshake with X.X.X.X: authentication error

The repeat IV and repeat salt lines come from "1. Identical replay" probes. shadowsocks-libev has a filter to prevent identical replay, but other implementations do not. invalid address type and authentication error will result from non-identical replay or random probes. invalid address type is from Stream ciphers (aes-128-ctf, aes-128-cfb, etc.); it means that the Shadowsocks server tried to decrypt the active probe, but after decryption it was not a well-formed proxy request. (Sometimes, by pure chance, a probe does decrypt to a well-formed proxy request, and the log contains a connect to message with an apparently random IP address and port number.) authentication error comes from AEAD ciphers (chacha20-ietf-poly1305, aes-128-gcm, etc.); it happens because the active probers do not know the Shadowsocks server password and cannot accidentally produce a valid ciphertext.

fortuna commented 3 years ago

Since this report, we've made a few improvements to Outline to address some possible issues:

I don't know the impact of those changes on server detection yet. If anyone measures, please let me know! I'd be happy to collaborate.

ghost commented 3 years ago

According to our recent survey, nearly one years past, some popular shadowsocks implementation:

were still vulnerable, they just didn't know this problem. We notified most of them and they've fixed it. They're listed here because we only investigated them and ALL of them has exactly same problem. (See https://github.com/shadowsocks/shadowsocks-rust/issues/292 (in Chinese)).

There's only one true obfs4. But There are just too many shadowsocks. Most of them never know this place. Most of them never known by this place.

gfw-report commented 3 years ago

some popular shadowsocks implementation ... were still vulnerable, they just didn't know this problem.

Thank you for reporting this, @studentmain. We have been preparing a short post to better convey our research findings to both users and developers. And we hope it will help.

Most of them never know this place. Most of them never known by this place.

We agreed. And that's why we sincerely appreciate people like you who have been helping strengthen the communications in our community. We will do our part as well and let's see.

ghost commented 3 years ago

We (Qv2ray project) are planning a user-friendly vulnerability scanner. We think if frightened user can check problem easily, developer will know it.

gfw-report commented 3 years ago

We (Qv2ray project) are planning a user-friendly vulnerability scanner.

That's awesome! FYI, we have released a prober simulator that can simulate replay-based and random probes sent by the GFW.

fortuna commented 3 years ago

For Outline we wrote unit tests that does the probing, which I found very useful: https://github.com/Jigsaw-Code/outline-ss-server/blob/4f3ce4d267289789f2441ac6f93cd5ac765efbf8/service/tcp_test.go#L376