ooni / probe

OONI Probe network measurement tool for detecting internet censorship
https://ooni.org/install
BSD 3-Clause "New" or "Revised" License
752 stars 142 forks source link

1.th.ooni.org: censored access to specific URLs #2418

Open bassosimone opened 1 year ago

bassosimone commented 1 year ago

Today a user reported two cases of Web Connectivity v0.4 measurements where the resulting x_status was 16 meaning that the test helper failed. Both measurements were using the same TH: 0.th.ooni.org.

I repeated those measurements manually to confirm the initial diagnosis. Here is the output I have seen:

/*
    % ./oohelper -server https://1.th.ooni.org -target http://xnxx.com
*/
{
    "tcp_connect": {
        "185.88.181.53:443": {
            "status": true,
            "failure": null
        },
        /* snip */
    },
    "tls_handshake": {
        "185.88.181.53:443": {
            "server_name": "xnxx.com",
            "status": true,
            "failure": null
        },
        /* snip */
    },
    "quic_handshake": null,
    "http_request": {
        "body_length": -1,
        "discovered_h3_endpoint": "",
        "failure": "unknown_error",        /* <- here */
        "title": "",
        "headers": {},
        "status_code": -1
    },
    "http3_request": null,
    "dns": {
        "failure": null,
        "addrs": [
            "185.88.181.57",
            "185.88.181.58",
            "185.88.181.59",
            "185.88.181.60",
            "185.88.181.53",
            "185.88.181.54",
            "185.88.181.55",
            "185.88.181.56"
        ]
    },
    "ip_info": {
        "185.88.181.53": {
            "asn": 46652,
            "flags": 11
        },
        /* snip */
    }
}

and:

./oohelper -server https://1.th.ooni.org -target http://pornhub.com    master
{
    "tcp_connect": {
        "66.254.114.41:443": {
            "status": true,
            "failure": null
        },
        "66.254.114.41:80": {
            "status": true,
            "failure": null
        }
    },
    "tls_handshake": {
        "66.254.114.41:443": {
            "server_name": "pornhub.com",
            "status": false,
            "failure": "connection_reset"         /* <- here */
        }
    },
    "quic_handshake": null,
    "http_request": {
        "body_length": 688805,
        "discovered_h3_endpoint": "",
        "failure": null,
        "title": "Free Porn Videos \u0026amp; Sex Movies - Porno, XXX, Porn Tube \u0026#124; Pornhub",
        "headers": {
            /* snip */
        },
        "status_code": 200
    },
    "http3_request": null,
    "dns": {
        "failure": null,
        "addrs": [
            "66.254.114.41"
        ]
    },
    "ip_info": {
        "66.254.114.41": {
            "asn": 29789,
            "flags": 3
        }
    }
}

It's a bummer that the TH has a limited set of errors. This fact makes debugging more complex.

Anyway, here's what I see if I use curl on the 1.th.ooni.org box:

% curl https://pornhub.com
curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to pornhub.com:443 

curl -v http://xnxx.com
*   Trying 185.88.181.54:80...
* Connected to xnxx.com (185.88.181.54) port 80 (#0)
> GET / HTTP/1.1
> Host: xnxx.com
> User-Agent: curl/7.74.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 301 Moved Permanently
< Date: Mon, 20 Feb 2023 16:37:24 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 0
< P3p: policyref="/p3p.xml", CP="NOI CURa ADMa DEVa TAIa OUR BUS IND UNI COM NAV INT"
< Vary: Accept-Encoding,User-Agent,Accept-Language,Cookie
< Location: https://www.xnxx.com/
< Server: nginx
< 
* Connection #0 to host xnxx.com left intact

% curl https://www.xnxx.com
curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to www.xnxx.com:443 

I think the underlying issue is that the TH, which is deployed on Digital Ocean, is hosted in a country (apparently, India) where there is upstream censorship.

Now, let's see whether iterative network tracing can help us to understand where censorship happens. To this end, I am using miniooni and the new tlsmiddlebox experiment. I am using as test helper the IP address of www.example.com. I am editing the output of the experiment to make it easier to understand (removing all "in progress" messages).

% ./miniooni -y tlsmiddlebox -O TestHelper=tlshandshake://93.184.216.34 -i tlstrace://pornhub.com
[...]
[      1.352423] <info> - country: IN
[      1.352448] <info> - network: DigitalOcean, LLC (AS14061)
[      1.352452] <info> - resolver's IP: 139.59.79.154
[      1.352456] <info> - resolver's network: DigitalOcean, LLC (AS14061)
[...]

So, the experiment has started and now we see the control run. We're using the 93.184.216.34:443 with the example.com server name. We expect to see successes here:

[      3.746441] <info> Handshake Trace #0 TTL 15 93.184.216.34:443 example.com... ok
[      3.878460] <info> Handshake Trace #0 TTL 16 93.184.216.34:443 example.com... ok
[      3.976288] <info> Handshake Trace #0 TTL 17 93.184.216.34:443 example.com... ok
[      4.088836] <info> Handshake Trace #0 TTL 18 93.184.216.34:443 example.com... ok
[      4.181487] <info> Handshake Trace #0 TTL 19 93.184.216.34:443 example.com... ok
[      4.276177] <info> Handshake Trace #0 TTL 20 93.184.216.34:443 example.com... ok
[     11.862834] <info> Handshake Trace #0 TTL 1 93.184.216.34:443 example.com... generic_timeout_error
[     11.949405] <info> Handshake Trace #0 TTL 2 93.184.216.34:443 example.com... generic_timeout_error
[     12.065993] <info> Handshake Trace #0 TTL 3 93.184.216.34:443 example.com... generic_timeout_error
[     12.165584] <info> Handshake Trace #0 TTL 4 93.184.216.34:443 example.com... generic_timeout_error
[     12.261142] <info> Handshake Trace #0 TTL 5 93.184.216.34:443 example.com... generic_timeout_error
[     12.354636] <info> Handshake Trace #0 TTL 6 93.184.216.34:443 example.com... generic_timeout_error
[     12.451334] <info> Handshake Trace #0 TTL 7 93.184.216.34:443 example.com... generic_timeout_error
[     12.557061] <info> Handshake Trace #0 TTL 8 93.184.216.34:443 example.com... generic_timeout_error
[     12.650676] <info> Handshake Trace #0 TTL 9 93.184.216.34:443 example.com... generic_timeout_error
[     12.764286] <info> Handshake Trace #0 TTL 10 93.184.216.34:443 example.com... generic_timeout_error
[     12.860831] <info> Handshake Trace #0 TTL 11 93.184.216.34:443 example.com... generic_timeout_error
[     12.955474] <info> Handshake Trace #0 TTL 12 93.184.216.34:443 example.com... generic_timeout_error
[     13.054065] <info> Handshake Trace #0 TTL 13 93.184.216.34:443 example.com... generic_timeout_error
[     13.157002] <info> Handshake Trace #0 TTL 14 93.184.216.34:443 example.com... generic_timeout_error

We have actually seen many timeouts before reaching out to a certain TTL.

After the control run, comes the experiment run, where we're using the possibly offending SNI:

[     14.054510] <info> Handshake Trace #0 TTL 7 93.184.216.34:443 pornhub.com... connection_reset
[     14.127309] <info> Handshake Trace #0 TTL 8 93.184.216.34:443 pornhub.com... connection_reset
[     14.245186] <info> Handshake Trace #0 TTL 9 93.184.216.34:443 pornhub.com... connection_reset
[     14.357848] <info> Handshake Trace #0 TTL 10 93.184.216.34:443 pornhub.com... connection_reset
[     14.451811] <info> Handshake Trace #0 TTL 11 93.184.216.34:443 pornhub.com... connection_reset
[     14.533643] <info> Handshake Trace #0 TTL 12 93.184.216.34:443 pornhub.com... connection_reset
[     14.647866] <info> Handshake Trace #0 TTL 13 93.184.216.34:443 pornhub.com... connection_reset
[     14.734782] <info> Handshake Trace #0 TTL 14 93.184.216.34:443 pornhub.com... connection_reset
[     14.851329] <info> Handshake Trace #0 TTL 15 93.184.216.34:443 pornhub.com... connection_reset
[     14.948620] <info> Handshake Trace #0 TTL 16 93.184.216.34:443 pornhub.com... connection_reset
[     15.051656] <info> Handshake Trace #0 TTL 17 93.184.216.34:443 pornhub.com... connection_reset
[     15.134821] <info> Handshake Trace #0 TTL 18 93.184.216.34:443 pornhub.com... connection_reset
[     15.251577] <info> Handshake Trace #0 TTL 19 93.184.216.34:443 pornhub.com... connection_reset
[     15.356168] <info> Handshake Trace #0 TTL 20 93.184.216.34:443 pornhub.com... connection_reset
[     23.408558] <info> Handshake Trace #0 TTL 1 93.184.216.34:443 pornhub.com... generic_timeout_error
[     23.507378] <info> Handshake Trace #0 TTL 2 93.184.216.34:443 pornhub.com... generic_timeout_error
[     23.620332] <info> Handshake Trace #0 TTL 3 93.184.216.34:443 pornhub.com... generic_timeout_error
[     23.716098] <info> Handshake Trace #0 TTL 4 93.184.216.34:443 pornhub.com... generic_timeout_error
[     23.816803] <info> Handshake Trace #0 TTL 5 93.184.216.34:443 pornhub.com... generic_timeout_error
[     23.917565] <info> Handshake Trace #0 TTL 6 93.184.216.34:443 pornhub.com... generic_timeout_error

So, you basically see that for TTL lower than 6 we see a timeout and for higher TTLs we have a connection reset by peer.

The experiment now continues and we see its report ID etc:

[     23.918914] <info> submitting measurement to OONI collector; please be patient...
[     24.104920] <info> New reportID: 20230220T164649Z_tlsmiddlebox_IN_14061_n1_dCU9jYuRM3C7Ec4V
[     24.297597] <info> saving measurement to disk
[     24.297832] <info> experiment: recv   0.00  byte, sent   0.00  byte
[     24.298021] <info> whole session: recv   4.70 kbyte, sent  14.18 kbyte

Let's now repeat the tlsmiddlebox experiment for the other possibly offending SNI (I will provide less commentary now):

./miniooni -y tlsmiddlebox -O TestHelper=tlshandshake://93.184.216.34 -i tlstrace://www.xnxx.com
[      0.000076] <info> Current time: 2023-02-20 16:54:38 UTC
[      0.000283] <info> miniooni home directory: $HOME/.miniooni
[      0.000535] <info> ooniprobe-engine/v3.18.0-alpha b69759ade1c88c263105e55503d9c687c11d709e dirty=false go1.19.6
[      0.000700] <info> Looking up OONI backends; please be patient...
2023/02/20 16:54:38 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details.
[      0.084572] <info> sessionresolver: lookup api.ooni.io using http3://cloudflare-dns.com/dns-query... ok
[      0.620675] <info> session: using probe services: {Address:https://api.ooni.io Type:https Front:}
[      0.620706] <info> Looking up your location; please be patient...
[      0.620733] <info> iplookup: using cloudflare
[      0.626257] <info> sessionresolver: lookup www.cloudflare.com using http3://cloudflare-dns.com/dns-query... ok
[      0.901298] <info> - country: IN
[      0.901321] <info> - network: DigitalOcean, LLC (AS14061)
[      0.901325] <info> - resolver's IP: 139.59.79.154
[      0.901329] <info> - resolver's network: DigitalOcean, LLC (AS14061)
[      0.901364] <info> [1/1] running with input: tlstrace://www.xnxx.com
[      0.901434] <info> DNSLookup #0, https://mozilla.cloudflare-dns.com/dns-query, 93.184.216.34... ok
[      1.157290] <info> TCPConnect #0 93.184.216.34:443... ok
[      1.657619] <info> Handshake Trace #0 TTL 1 93.184.216.34:443 example.com... in progress
[      1.758263] <info> Handshake Trace #0 TTL 2 93.184.216.34:443 example.com... in progress
[      1.860758] <info> Handshake Trace #0 TTL 3 93.184.216.34:443 example.com... in progress
[      1.959224] <info> Handshake Trace #0 TTL 4 93.184.216.34:443 example.com... in progress
[      2.058038] <info> Handshake Trace #0 TTL 5 93.184.216.34:443 example.com... in progress
[      2.157878] <info> Handshake Trace #0 TTL 6 93.184.216.34:443 example.com... in progress
[      2.259581] <info> Handshake Trace #0 TTL 7 93.184.216.34:443 example.com... in progress
[      2.359038] <info> Handshake Trace #0 TTL 8 93.184.216.34:443 example.com... in progress
[      2.459010] <info> Handshake Trace #0 TTL 9 93.184.216.34:443 example.com... in progress
[      2.558284] <info> Handshake Trace #0 TTL 10 93.184.216.34:443 example.com... in progress
[      2.658785] <info> Handshake Trace #0 TTL 11 93.184.216.34:443 example.com... in progress
[      2.759092] <info> Handshake Trace #0 TTL 12 93.184.216.34:443 example.com... in progress
[      2.858193] <info> Handshake Trace #0 TTL 13 93.184.216.34:443 example.com... in progress
[      2.958716] <info> Handshake Trace #0 TTL 14 93.184.216.34:443 example.com... in progress
[      3.059002] <info> Handshake Trace #0 TTL 15 93.184.216.34:443 example.com... in progress
[      3.157825] <info> Handshake Trace #0 TTL 16 93.184.216.34:443 example.com... in progress
[      3.258050] <info> Handshake Trace #0 TTL 17 93.184.216.34:443 example.com... in progress
[      3.316977] <info> Handshake Trace #0 TTL 15 93.184.216.34:443 example.com... ok
[      3.359233] <info> Handshake Trace #0 TTL 18 93.184.216.34:443 example.com... in progress
[      3.427097] <info> Handshake Trace #0 TTL 16 93.184.216.34:443 example.com... ok
[      3.458382] <info> Handshake Trace #0 TTL 19 93.184.216.34:443 example.com... in progress
[      3.543942] <info> Handshake Trace #0 TTL 17 93.184.216.34:443 example.com... ok
[      3.558221] <info> Handshake Trace #0 TTL 20 93.184.216.34:443 example.com... in progress
[      3.646636] <info> Handshake Trace #0 TTL 18 93.184.216.34:443 example.com... ok
[      3.736996] <info> Handshake Trace #0 TTL 19 93.184.216.34:443 example.com... ok
[      3.817788] <info> Handshake Trace #0 TTL 20 93.184.216.34:443 example.com... ok
[     11.406486] <info> Handshake Trace #0 TTL 1 93.184.216.34:443 example.com... generic_timeout_error
[     11.510875] <info> Handshake Trace #0 TTL 2 93.184.216.34:443 example.com... generic_timeout_error
[     11.612358] <info> Handshake Trace #0 TTL 3 93.184.216.34:443 example.com... generic_timeout_error
[     11.707872] <info> Handshake Trace #0 TTL 4 93.184.216.34:443 example.com... generic_timeout_error
[     11.805417] <info> Handshake Trace #0 TTL 5 93.184.216.34:443 example.com... generic_timeout_error
[     11.915976] <info> Handshake Trace #0 TTL 6 93.184.216.34:443 example.com... generic_timeout_error
[     12.017846] <info> Handshake Trace #0 TTL 7 93.184.216.34:443 example.com... generic_timeout_error
[     12.114475] <info> Handshake Trace #0 TTL 8 93.184.216.34:443 example.com... generic_timeout_error
[     12.219998] <info> Handshake Trace #0 TTL 9 93.184.216.34:443 example.com... generic_timeout_error
[     12.305735] <info> Handshake Trace #0 TTL 10 93.184.216.34:443 example.com... generic_timeout_error
[     12.410367] <info> Handshake Trace #0 TTL 11 93.184.216.34:443 example.com... generic_timeout_error
[     12.509943] <info> Handshake Trace #0 TTL 12 93.184.216.34:443 example.com... generic_timeout_error
[     12.613478] <info> Handshake Trace #0 TTL 13 93.184.216.34:443 example.com... generic_timeout_error
[     12.705078] <info> Handshake Trace #0 TTL 14 93.184.216.34:443 example.com... generic_timeout_error
[     13.205646] <info> Handshake Trace #0 TTL 1 93.184.216.34:443 www.xnxx.com... in progress
[     13.306992] <info> Handshake Trace #0 TTL 2 93.184.216.34:443 www.xnxx.com... in progress
[     13.407537] <info> Handshake Trace #0 TTL 3 93.184.216.34:443 www.xnxx.com... in progress
[     13.506832] <info> Handshake Trace #0 TTL 4 93.184.216.34:443 www.xnxx.com... in progress
[     13.574611] <info> Handshake Trace #0 TTL 7 93.184.216.34:443 www.xnxx.com... connection_reset
[     13.606047] <info> Handshake Trace #0 TTL 5 93.184.216.34:443 www.xnxx.com... in progress
[     13.702785] <info> Handshake Trace #0 TTL 8 93.184.216.34:443 www.xnxx.com... connection_reset
[     13.706002] <info> Handshake Trace #0 TTL 6 93.184.216.34:443 www.xnxx.com... in progress
[     13.771318] <info> Handshake Trace #0 TTL 9 93.184.216.34:443 www.xnxx.com... connection_reset
[     13.901940] <info> Handshake Trace #0 TTL 10 93.184.216.34:443 www.xnxx.com... connection_reset
[     13.997583] <info> Handshake Trace #0 TTL 11 93.184.216.34:443 www.xnxx.com... connection_reset
[     14.107252] <info> Handshake Trace #0 TTL 12 93.184.216.34:443 www.xnxx.com... connection_reset
[     14.171983] <info> Handshake Trace #0 TTL 13 93.184.216.34:443 www.xnxx.com... connection_reset
[     14.308487] <info> Handshake Trace #0 TTL 14 93.184.216.34:443 www.xnxx.com... connection_reset
[     14.400742] <info> Handshake Trace #0 TTL 15 93.184.216.34:443 www.xnxx.com... connection_reset
[     14.495381] <info> Handshake Trace #0 TTL 16 93.184.216.34:443 www.xnxx.com... connection_reset
[     14.601423] <info> Handshake Trace #0 TTL 17 93.184.216.34:443 www.xnxx.com... connection_reset
[     14.706281] <info> Handshake Trace #0 TTL 18 93.184.216.34:443 www.xnxx.com... connection_reset
[     14.780963] <info> Handshake Trace #0 TTL 19 93.184.216.34:443 www.xnxx.com... connection_reset
[     14.895850] <info> Handshake Trace #0 TTL 20 93.184.216.34:443 www.xnxx.com... connection_reset
[     22.964828] <info> Handshake Trace #0 TTL 1 93.184.216.34:443 www.xnxx.com... generic_timeout_error
[     23.069029] <info> Handshake Trace #0 TTL 2 93.184.216.34:443 www.xnxx.com... generic_timeout_error
[     23.157750] <info> Handshake Trace #0 TTL 3 93.184.216.34:443 www.xnxx.com... generic_timeout_error
[     23.269774] <info> Handshake Trace #0 TTL 4 93.184.216.34:443 www.xnxx.com... generic_timeout_error
[     23.359664] <info> Handshake Trace #0 TTL 5 93.184.216.34:443 www.xnxx.com... generic_timeout_error
[     23.464789] <info> Handshake Trace #0 TTL 6 93.184.216.34:443 www.xnxx.com... generic_timeout_error
[     23.465895] <info> submitting measurement to OONI collector; please be patient...
[     23.639095] <info> New reportID: 20230220T165502Z_tlsmiddlebox_IN_14061_n1_hq8bccpT9R3domIK
[     23.812360] <info> saving measurement to disk
[     23.812697] <info> experiment: recv   0.00  byte, sent   0.00  byte
[     23.812932] <info> whole session: recv   4.70 kbyte, sent  14.15 kbyte

We can now use mtr to try to guess which could be the seventh hop. Because miniooni does not run with root privileges we cannot perform this analysis inside it. Also, keep in mind that routing may change after TCP connect for censorship purposes (as well as for other legitimate purposes), hence we cannot be super confident about identifying the ISP:

mtr --report-wide -z -P 443 -T example.com
Start: 2023-02-20T16:57:10+0000
HOST: roaming-th-20230215122551                   Loss%   Snt   Last   Avg  Best  Wrst StDev
[...]
  5. AS14061  2a03:b0c0:fffe::182                  0.0%    10    0.7   0.8   0.5   1.2   0.2
        2a03:b0c0:fffe::180                
     AS14061  2a03:b0c0:fffe::180
        2a03:b0c0:fffe::17e                
     AS14061  2a03:b0c0:fffe::17e
  6. AS9498   2404:a800:3a00::e9                   0.0%    10    1.7   2.5   1.7   6.1   1.5
        2404:a800:3a00:1::60d              
     AS9498   2404:a800:3a00:1::60d
  7. AS9498   2404:a800::158                       0.0%    10  258.4 265.5 247.4 279.9  11.7
[...]

So, with all the above-mentioned caveats, AS14061 is Digital Ocean LLC (where we host our 1.th.ooni.org). The seventh hop instead is AS9498 BHARTI Airtel Ltd.

bassosimone commented 1 year ago

To mitigate the immediate issue, we have rotated the TH on another data center. There are more activities that we could perform to improve the situation here. I'll leave this issue open for a while more until I've spelled them all out.