Modular circumvention tools testing

hellais commented 6 years ago

The current of censorship circumvention tools testing in OONI Probe could do with some improvements, especially in light of the recent mobile app developments and the future phasing out of the python based probes.

Current state

The python based OONI Probe implements the following tests:

Vanilla Tor, which checks if the latest version of tor on the users system is able to bootstrap without any bridges. This is run automatically as part of the default deck (i.e. ooniprobe on macOS, Linux, Raspberry Pi will run this test)
TCP Connect to bridges. We check if the default Tor Browser bridges are reachable in the sense that we can establish a TCP connection to their IP and port. The actual addresses of bridges can be found here: https://github.com/OpenObservatory/ooni-resources/blob/master/bridge_reachability/tor-bridges-ip-port.csv
Meek frontend requests checks if the domain fronts of meek work. This has also been implemented as part of measurement-kit, but we don't currently run it as part of the mobile app.

Moreover we have more tests, that are not run automatically:

Bridge reachability, which checks the reachability of tor bridges using obfs2,3,4, FTE and scramblesuit.
Psiphon which runs a very old version of psiphon
Lantern which runs a old version of lantern and is tricky to deploy
OpenVPN

All the above tests present similar issues and complexities due to how they should be deployed. Namely we need to separately ship all the Circumvention tool clients to the users machine to test them.

Work in progress

We have started some cursory work in understanding how we are going to support running the circumvention tools inside of OONI Probe. The basic idea is that we would somehow build them as part of measurement-kit and link to them.

Master ticket: https://github.com/measurement-kit/measurement-kit/issues/1399

Here are relevant tickets on the topic:

Psiphon: https://github.com/TheTorProject/ooni-probe/issues/744, https://github.com/measurement-kit/measurement-kit/issues/1425
Tor pluggable transports: https://github.com/measurement-kit/measurement-kit/issues/1426
OpenVPN: https://github.com/measurement-kit/measurement-kit/issues/1418
Avoid duplicating go runtime: https://github.com/TheTorProject/ooni-probe/issues/653
Lantern: https://github.com/measurement-kit/measurement-kit/issues/1400
Tor integration: https://github.com/measurement-kit/measurement-kit/issues/86

I think that some of these issues are for sure important to solve and unavoidable, but I think there is also perhaps another approach to going abouts doing this.

Modular Circumvention tool testing

The approach I would like to propose is to try to attempt to modularise the problem a bit more. By this I mean trying to synthesize most circumvention tools and strategies into some basic building blocks.

I think the table at the end of this page: http://obfuscation.github.io/, clearly shows how much overlap there in fact is amongst circumvention tools out there.

While it is the case that some of the PTs do some further customisation and tuning on the "vanilla" obfs4 or whathave you, they are still more or less the same building blocks.

The macro-categories that affect, I think, all circumvention platforms are the following:

Rendezvous and discovery. This is how the tool discovers of new addresses and/or secrets to use to connect to the network. (examples: bridgeDB and s3 bucket with some addresses, DNS over HTTPS, etc.)
Pluggable transport. This is the actual underlying transport that is used to route the bulk of the traffic.
Address rotation and re-discovery. I think every tool out there probably has some specific logic to how it handles the situation in which a particular address or endpoint stops working and what they do to handle that failure "gracefully".

In light of this, my proposal would be to perhaps see if it's possible to fit every, if not most, Circumvention tool into a model of this sort. Ideally we would have a single implementation of each PT, but expose enough configuration to make it possible for Circumvention tools to "customize it" like they do in their own app. In any case it seems like most PTs, these days, are using golang (ex. Psiphon, Tor & Lantern), so even if we have to ship many different versions of, say obfs4, that should not be a problem.

It would be then great if the circumvention tool makers could run a specific instance of their discovery mechanism just for OONI Probe that would give out addresses (and perhaps configuration options) to OONI Probe to make it possible to do automated testing.

At the end of the day, the result of an OONI Probe test would be a logical statement on top of the various observations gathered from measuring 1 & 2.

I think this would also be something valuable to circumvention tool authors, because it would give them the ability to try out new ideas without having to roll them out into production with real users, but could just try things out on OONI Probe.

What do you think?

cc @oxtoacart @mirokuratczyk @bassosimone

bassosimone commented 4 years ago

This is priority/low and effort/XL because it looks like an huge project. Also, I am worried by the fact that the strategy implemented by, e.g., Psiphon, may move more quickly than we can track. For this reason, I am instead happy with the possibility of embedding Psiphon. This explains why putting this issue in the icebox makes sense. Consider that we have already several issues dealing with adding new circumvention tools to github.com/ooni/probe-engine. We should revisit this issue once we have added some of them and have some hands-on experience (i.e. in ~10 months from now).

oxtoacart commented 3 years ago

@hellais This definitely seems worth thinking about. That said, there are a lot of things that we do outside of the core obfuscation layer (obfs4 or whatever) that affects circumvention and doesn't necessarily fit neatly into a generic framework. This includes, but is not limited to:

Authenticating client connections at the application layer (e.g. password based authentication) and how they respond to authentication failures
Authenticating client connections at the protocol layer (i.e. pre-shared keys)
Hardening servers against active probe attacks (this makes a big difference)
The specific protocol that's used to tell the server where to connect upstream (Lantern uses HTTP CONNECT, some use SOCKS, etc.). These all have certain timing characteristics which may affect blocking resistance.
Split tunneling - Lantern doesn't actually tunnel everything. If censors use active attacks by collaborating with origin sites, for example have baidu.cn examine all connections from foreign IPs), whether or not the client does split tunneling and how can affect blocking resistance
We've definitely seen cases where clients seem to suffer from guilt by association, so running multiple protocols to multiple proxies can be a bad thing because if one of them exposes that a given client IP is circumventing, it can lead to closer scrutiny of that client's other connections and eventually to blocking of the other proxies in use by that client (even though the protocols used by those proxies may on their own evade detection).

To put it in perspective, Lantern currently relies mostly on standard protocols like HTTPS, just configured in very special ways and with special tricks on the servers. Running probes using vanilla HTTPS against vanilla HTTPS proxies may very well fail in circumstances where Lantern's use of HTTPS works just fine.

As it happens, we're currently engaged in an effort to better package Lantern as a library. If Lantern were available as a go gettable library, would that make it easier for you to integrate?

bassosimone commented 3 years ago

Thanks, @oxtoacart, for your detailed explanation!

[...] As it happens, we're currently engaged in an effort to better package Lantern as a library. If Lantern were available as a go gettable library, would that make it easier for you to integrate?

Yes, definitely, we would love to do that!

ooni / probe