net4people / bbs

Forum for discussing Internet censorship circumvention
3.35k stars 78 forks source link

Triplet Censors: Demystifying Great Firewall's DNS Censorship Behavior (FOCI 2020) #47

Open wkrp opened 4 years ago

wkrp commented 4 years ago

Triplet Censors: Demystifying Great Firewall's DNS Censorship Behavior Anonymous, Arian Akhavan Niaki, Nguyen Phong Hoang, Phillipa Gill, Amir Houmansadr https://censorbib.nymity.ch/#Anonymous2020a https://www.usenix.org/conference/foci20/presentation/anonymous (video and slides) https://gfw.report/publications/foci20_dns/en/ (code and data)

The paper is a study of DNS injection by the GFW. While there have been many similar studies, this one goes further in its methodology, finds interesting new behavior, and explains phenomena that past work could not. The most striking observations are that different groups of domains are poisoned by different subsets of the poison IP address pool; and that there are (at least) three different DNS injectors, each with its own network fingerprint, responding to distinct but overlapping subsets of domains.

The primary experiment is nine months of querying a million domains every two hours. The queries are sent from outside China to a controlled VPS inside China, taking advantage of the bidirectionality of DNS injection. The overall trend is for more domains to be blocked over time, increasing from 24000 in September 2019 to 24600 in May 2020. Examining the hour-by-hour changes reveals certain implementation details, for example an evident pattern change from *youtube.com to the more specific *.youtube.com resulted in the sudden unblocking of about 50 domains. Before 2019-11-23, DNS injections drew from a pool of 1510 phony IP addresses; but on that date, the size of the pool suddenly shrank to 216. Curiously, the selection of a phony IP address is not uniform for every injection; each domain draws from only a subset of the total pool. Domains may be organized into groups, according to which subset of phony IP addresses they use. Group 4, for example, consists of 33 Google-related domains, each of which is poisoned by a subset of only four IP addresses. The IP addresses making up the total pool are not random—most of them belong to US-based organizations like Facebook, Dropbox, and Twitter, though most do not point to a live host.

There is more than one DNS injector. The authors provide robust network fingerprints for three, using features such as the flags in the IP and DNS headers, and trends in the IP ID and IP TTL fields. The injectors handle different (but overlapping) subsets of domains and draw from different (but overlapping) IP address pools, corresponding to the domain groups mentioned earlier. Injector 1 handles the fewest domains, but the most popular. Injector 3's domains are a subset of Injector 2's. Injector 1 uses incrementing TTLs and Injector 2's TTLs are random, but Injector 3 does something weird: it reflects the TTL of the query in the response, meaning that the original TTL must be at least twice the distance to the injector for the injected response to make it back to the sender. Taking this quirk of TTL into account, all three injectors lie at the same hop distance away from the probe host, and timing measurements are consistent with all three being co-located.

The authors then do a separate, one-time, multi-path experiment, querying a single blocked domain name against a random IP address in virtually every network prefix announced in China, 36146 addresses and 417 ASes in total. 92% of prefixes are affected by at least one injector, and 62% are affected by all three. 4% are affected by yet a fourth injector, whose fingerprint does not match that of the other three.

Thanks to the authors for reviewing a draft of this summary.

klzgrad commented 4 years ago

The IP addresses making up the total pool are not random—most of them belong to US-based organizations like Facebook, Dropbox, and Twitter

Ah, this is interesting. This could explain more and more reports that access to websites encounters certificate errors instead of timeout. Certificate errors appear legitimately caused by misconfiguration instead of censorship as certificate misconfiguration is very common in China, thus the grievance is no longer directed at the GFW.

There is more than one DNS injector

I think these could be independently developed projects with different design goals. Maybe some of these are outsourced to contractors. Being independent allows them not fail simultaneously.

gfw-report commented 4 years ago

This could explain more and more reports that access to websites encounters certificate errors instead of timeout.

Thank you for sharing such an interesting hypothesis; however, it seems these certificate errors may not mainly because the clients were directed to some Facebook/Dropbox/Twitter servers. This is because clients in China could not even complete a TCP handshake to the port 443 of these injected IPs in the first place.

As mentioned in Section 3.2, we test the reachability of the 216 injected IPs from our VPS in China and the United States by initiating TCP handshakes on port 80 and port 443. Specifically, we perform this experiment daily for 7 days (from April 17, 2020 to April 23, 2020) and each days results looked similar.

The result, summarized in Figure 3, shows only 0.4% of these IP-port pairs were ever observed to be reachable from China.

You may find the following code and data helpful:

klzgrad commented 4 years ago

OK, these reports may come users with partial circumvention where they can reach the injected IPs but nonetheless are affected by DNS pollution for some reasons. As is quantified in your data, this should be an uncommon case.

gfw-report commented 4 years ago

There is more than one DNS injector

I think these could be independently developed projects with different design goals. Maybe some of these are outsourced to contractors.

This is a very reasonable hypothesis.

Being independent allows them not fail simultaneously.

It makes sense that the censor tries to avoid single point failure. One evidence that supports your hypothesis is that we indeed observed some injectors were halting for a short period of time, but we never observed all three injectors halted at the same time.

Specifically, as introduced in the Halting interval of injectors paragraph, we discover that while Injector 2 has been working consecutively, Injector 1 and Injector 3 occasionally stopped working for a few hours. All of these occasionally happened halts lasted less than 6 hours and most of them happened during work hours in China.

gfw-report commented 4 years ago

OK, these reports may come users with partial circumvention where they can reach the injected IPs but nonetheless are affected by DNS pollution for some reasons.

Oh, it definitely makes sense then! These cases are not uncommon in many circumvention scenarios.

klzgrad commented 4 years ago

we discover that while Injector 2 has been working consecutively, Injector 1 and Injector 3 occasionally stopped working for a few hours. All of these occasionally happened halts lasted less than 6 hours and most of them happened during work hours in China.

This is a more vivid picture. I imagine the three injectors are maintained by three different contractors independently and this allows they to rotate shifts and improve reliability at the project management level.

klzgrad commented 4 years ago

Sorry, to add one more. Certificate errors are very common in this sense: A common setup uses domain-based traffic routing to improve performance so domestic traffic is direct and not routed through circumvention. Facebook, Twitter, et al are always in the circumvention routing lists. And whenever a domain (especially CDN domains) is blocked but not updated to the routing list, it will be resolved directly and incorrectly to Facebook's IPs and then have certificate errors via circumvention, which are confusing because users will perceive this as errors on the CDN side.

gfw-report commented 4 years ago

And whenever a domain (especially CDN domains) is blocked but not updated to the routing list,

Yes, we agreed this could happen quiet often, especially nowadays when one of the most popular routing list is less actively maintained.