net4people / bbs

Forum for discussing Internet censorship circumvention
3.19k stars 75 forks source link

Measuring and Evading Turkmenistan's Internet Censorship (WWW 2023) #273

Open wkrp opened 10 months ago

wkrp commented 10 months ago

Measuring and Evading Turkmenistan's Internet Censorship: A Case Study in Large-Scale Measurements of a Low-Penetration Country Sadia Nourin, Van Tran, Xi Jiang, Kevin Bock, Nick Feamster, Nguyen Phong Hoang, Dave Levin https://censorbib.nymity.ch/#Nourin2023a https://github.com/breakerspace/turkmenistan-censorship Measurement dashboard Lightning talk (3 min) Presentation slides

This is a study of DNS, HTTP, and TLS censorship in Turkmenistan, notably encompassing every IP address in the country. Turkmenistan poses a challenge for censorship measurement because of its low population and low availability of Internet access. It is difficult to take direct measurements from inside the country. This study uses remote measurement techniques, taking advantage of the bidirectionality of the firewall to do experiments without controlling a vantage point in Turkmenistan. The paper covers data collected in September and October 2022. The team has continued to do tests and made the results available in a dashboard at https://tmc.np-tokumei.net/.

Bidirectionality means the firewall filters incoming packets as well as outgoing ones. Sending a DNS query for a filtered domain name into the country results in an injected DNS response with a false IP address being sent back to the sender, just as if the query had been sent out of the country. Similarly, an HTTP request with a filtered Host header, or a TLS Client Hello with a filtered SNI, elicits an injected TCP RST packet, regardless of direction. In the case of HTTP and TLS, censorship persists for 30 seconds: any packet with the same source–destination 4-tuple within that interval gets another RST. Injected packets are easy to identify because they have a distinctive IP ID and initial TTL. In a change from https://github.com/net4people/bbs/issues/80#issuecomment-906533865 (August 2021), injection happens on all port numbers.

There are two big complications that make straightforward application of the bidirectionality property insufficient for large-scale measurement. The first is that—in what seems to be a first—source IP addresses that send many probes into the country may eventually stop getting injected responses, as if the censor were deliberately trying to frustrate analysis. To deal with this, the measurement system uses a diverse and changing set of source IP addresses from commercial VPSes. The second complication is that not all IP addresses in Turkmenistan are equal, in terms of whether they cause injection when they appear in the destination address of a probe. Different networks—and even neighboring addresses—differ in whether they trigger censorship responses. For this reason, the authors undertook to test every IP address in the country, some 22,700 addresses across 6 ASes. But this gives rise to another challenge, which is that while DNS probes do not require the probed IP address to be live, the HTTP and TLS tests occur in the context of a TCP connection, which requires that there be a live, responsive host at the destination. To work around this, the authors found a new sequence of probes that can detect TCP-based censorship injection without an established TCP connection: send a PSH+ACK packet containing the probe text (i.e. HTTP request or TLS Client Hello), wait 5 to 29 seconds, then send another packet. If the second packet gets a RST, it means the probe was recognized as one to censor. By combining these techniques, they were able to scan every IP address in Turkmenistan for DNS, HTTP, and TLS censorship.

The measurement process began with a pre-scan of all the IP addresses using a small number of domains, to find which ones were susceptible to censorship at all. They filtered out hosts that were found to be responsive during the pre-scan, in order to avoid sending them a lot of traffic in later phases. There were about 7,500 addresses (33%) that could trigger injection. Using the addresses in this smaller set, they probed 15.5 million domain names on DNS, HTTP, and TLS. They found 122,000 blocked domains in total. Blocklists differed by protocol, with HTTP having the most censored domains and DNS having the fewest. From the list of blocked domains and further probing they inferred regular expression blocking rules. Over-broad expressions like .*\.cyou.* and doh\..* cause a high degree of overblocking.

Finally, the authors use Geneva to find new circumvention strategies at the TCP/IP and application layers. These include setting one of the COUNT fields in a DNS query to 25 or greater, breaking the HTTP-version in an HTTP request across TCP segments, and inserting whitespace into the HTTP Host header.

Thanks to Sadia Nourin and Nguyen Phong Hoang for comments on a draft of this summary.

snourin commented 10 months ago

Hi everyone, I’m Sadia, one of the authors of this paper. In order to measure Turkmenistan’s censorship, we had to take advantage of bidirectional censorship, in which we had a client outside of Turkmenistan send censored requests to non-responsive IP addresses inside of Turkmenistan to trigger the censor. However, one question we frequently asked ourselves is whether our measurements from the outside→inside direction corroborates with measurements from the inside→outside direction.

It would be great if there were some volunteers within Turkmenistan who could spot-check some of our measurements for us from the inside→outside direction. Please ensure your safety and understand the risks of doing so before proceeding.

You can check whether TMC considers you to be censored by searching for your own IP address here. If you are deemed to be censored, you could test some of the domains that TMC believes to be censored. These domains can be found here and here. In order to test these domains, you could try to use the packet sequence we use for our measurements mentioned in the paper, or just send a simple DNS and HTTP(S) request.

If you determine that your IP address is NOT considered to be censored by TMC, you could still test some domains to determine whether the IP address is uncensored from the inside→outside direction as well.

Thank you.