ooni / probe

OONI Probe network measurement tool for detecting internet censorship
https://ooni.org/install
BSD 3-Clause "New" or "Revised" License
759 stars 142 forks source link

webconnectivity,websteps: correctly handle i18n domain names #1925

Closed bassosimone closed 9 months ago

bassosimone commented 2 years ago

Let's first document what happens with the current codebase if the input is http://ουτοπία.δπθ.gr as well as its punycode version (http://xn--kxae4bafwg.xn--pxaix.gr/).

The rationale of this investigation is to answer https://github.com/ooni/run/pull/81#discussion_r771424651.

The reason why it matters is that we cannot prevent a user from pasting a punycode URL and running the test from command line. So, we need to know what the CLI does in this case (which is a proxy of what mobile would do, given that they use the same implementation). A broader issue is how the rest of the ecosystem would handle it.

Executive Summary

URL Experiment WAI
http://ουτοπία.δπθ.gr web_connectivity
http://ουτοπία.δπθ.gr websteps
http://xn--kxae4bafwg.xn--pxaix.gr/ web_connectivity
http://xn--kxae4bafwg.xn--pxaix.gr/ websteps ✔️

With i18n URL

Web Connectivity

> > ./ooniprobe run websites --input http://ουτοπία.δπθ.gr
   • Running websites tests
[engine] iplookup: using ubuntu
[engine] sessionresolver: https://dns.google/dns-query... <nil>
[engine] sessionresolver: https://dns.google/dns-query... <nil>
[engine] sessionresolver: https://dns.google/dns-query... <nil>
[engine] session: using probe services: {Address:https://ps1.ooni.io Type:https Front:}
   0.00% processing input: http://ουτοπία.δπθ.gr 
[engine] dnslookup://ουτοπία.δπθ.gr...
[engine] dnslookup://ουτοπία.δπθ.gr... <nil>
[engine] using control: https://0.th.ooni.org
[engine] control for http://%CE%BF%CF%85%CF%84%CE%BF%CF%80%CE%AF%CE%B1.%CE%B4%CF%80%CE%B8.gr...
[engine] sessionresolver: https://mozilla.cloudflare-dns.com/dns-query... <nil>
[engine] control for http://%CE%BF%CF%85%CF%84%CE%BF%CF%80%CE%AF%CE%B1.%CE%B4%CF%80%CE%B8.gr... <nil>
[engine] DNS analysis result: inconsistent
[engine] TCP/TLS endpoints: 1/2 reachable
[engine] GET http://%CE%BF%CF%85%CF%84%CE%BF%CF%80%CE%AF%CE%B1.%CE%B4%CF%80%CE%B8.gr...
[engine] GET http://%CE%BF%CF%85%CF%84%CE%BF%CF%80%CE%AF%CE%B1.%CE%B4%CF%80%CE%B8.gr... unknown_failure: invalid domain in DNSCache
[snip]

This does not feel good at all. Regardless of where we apply the URL transformation, we cannot prevent a user from trying to test from the command line an i18n URL. Therefore, it's quite bad that the measurement fails.

Websteps

> ./miniooni -i http://ουτοπία.δπθ.gr websteps
[      0.000400] <info> Current time: 2021-12-17 15:45:13 CET
[      0.000492] <info> miniooni home directory: $HOME/.miniooni
[      0.000700] <info> Looking up OONI backends; please be patient...
[      0.050117] <info> sessionresolver: system:///... <nil>
[      1.391023] <info> sessionresolver: system:///... <nil>
[      1.642037] <info> session: using probe services: {Address:https://ps1.ooni.io Type:https Front:}
[      1.642086] <info> Looking up your location; please be patient...
[      1.642172] <info> iplookup: using stun_google
[      1.744619] <info> - country: IT
[      1.744680] <info> - network: Vodafone Italia S.p.A. (AS30722)
[      1.744692] <info> - resolver's IP: 91.80.36.88
[      1.744704] <info> - resolver's network: Vodafone Italia S.p.A. (AS30722)
[      1.744764] <info> [1/1] running with input: http://ουτοπία.δπθ.gr
[      1.745124] <info> MeasureURL url=http://ουτοπία.δπθ.gr
[      1.747613] <info> LookupHost ουτοπία.δπθ.gr with getaddrinfo... ok
[      1.912003] <info> LookupHost ουτοπία.δπθ.gr with 8.8.4.4:53/udp... ok
[      2.033907] <info> sessionresolver: system:///... <nil>
[      2.413384] <info> THClientCall http://%CE%BF%CF%85%CF%84%CE%BF%CF%80%CE%AF%CE%B1.%CE%B4%CF%80%CE%B8.gr... in progress
[      2.558625] <info> THClientCall http://%CE%BF%CF%85%CF%84%CE%BF%CF%80%CE%AF%CE%B1.%CE%B4%CF%80%CE%B8.gr... json: cannot unmarshal object into Go struct field THMeasurement.DNS of type []*measurex.DNSMeasurement
[      2.559315] <info> TCPConnect [2001:648:2e80::44]:80... host_unreachable
[      2.707853] <info> TCPConnect 192.108.114.44:80... ok
[      2.850519] <info> GET http://%CE%BF%CF%85%CF%84%CE%BF%CF%80%CE%AF%CE%B1.%CE%B4%CF%80%CE%B8.gr with 192.108.114.44:80/tcp... ok
[      2.852076] <info> submitting measurement to OONI collector; please be patient...
[      3.024791] <info> New reportID: 20211217T144515Z_websteps_IT_30722_n1_Al0T3eono0kwlQir
[      3.178916] <info> saving measurement to disk
[      3.180059] <info> experiment: recv   0.00  byte, sent   0.00  byte
[      3.181014] <info> sessionresolver: [{"URL":"system:///","Score":1},{"URL":"https://cloudflare-dns.com/dns-query","Score":0.999},{"URL":"https://mozilla.cloudflare-dns.com/dns-query","Score":0.99},{"URL":"https://dns.google/dns-query","Score":9.100000000000009e-17},{"URL":"http3://cloudflare-dns.com/dns-query","Score":0},{"URL":"https://dns.quad9.net/dns-query","Score":0},{"URL":"https://doh.powerdns.org/","Score":0},{"URL":"http3://mozilla.cloudflare-dns.com/dns-query","Score":0}]
[      3.181367] <info> whole session: recv   3.62 kbyte, sent   6.52 kbyte

With punycode URL

Web Connectivity

> ./ooniprobe run websites --input http://xn--kxae4bafwg.xn--pxaix.gr/
   • Running websites tests
[engine] iplookup: using avast
[engine] sessionresolver: https://dns.google/dns-query... <nil>
[engine] sessionresolver: https://dns.google/dns-query... <nil>
[engine] sessionresolver: https://dns.google/dns-query... <nil>
[engine] session: using probe services: {Address:https://ps2.ooni.io Type:https Front:}
   0.00% processing input: http://xn--kxae4bafwg.xn--pxaix.gr/ 
[engine] dnslookup://xn--kxae4bafwg.xn--pxaix.gr...
[engine] dnslookup://xn--kxae4bafwg.xn--pxaix.gr... <nil>
[engine] using control: https://0.th.ooni.org
[engine] control for http://xn--kxae4bafwg.xn--pxaix.gr/...
[engine] sessionresolver: https://mozilla.cloudflare-dns.com/dns-query... <nil>
[engine] control for http://xn--kxae4bafwg.xn--pxaix.gr/... <nil>
[engine] DNS analysis result: consistent
[engine] TCP/TLS endpoints: 1/2 reachable
[engine] GET http://xn--kxae4bafwg.xn--pxaix.gr/...
[engine] GET http://xn--kxae4bafwg.xn--pxaix.gr/... unknown_failure: invalid domain in DNSCache
[engine] BodyLengthMatch: nil
[engine] BodyProportion: 0
[engine] StatusCodeMatch: nil
[engine] HeadersMatch: nil
[engine] TitleMatch: nil
[engine] Blocking: nil
[engine] Accessible: nil
[engine] sessionresolver: [{"URL":"https://dns.google/dns-query","Score":1},{"URL":"https://cloudflare-dns.com/dns-query","Score":1},{"URL":"https://mozilla.cloudflare-dns.com/dns-query","Score":1},{"URL":"http3://cloudflare-dns.com/dns-query","Score":0.2628197959480042},{"URL":"https://dns.quad9.net/dns-query","Score":0.21171153174288534},{"URL":"http3://mozilla.cloudflare-dns.com/dns-query","Score":0.19995447808523037},{"URL":"https://doh.powerdns.org/","Score":0.08215147938910748},{"URL":"system:///","Score":0}]

Websteps

> ./miniooni -i http://xn--kxae4bafwg.xn--pxaix.gr/ websteps
[      0.000389] <info> Current time: 2021-12-17 15:49:24 CET
[      0.000479] <info> miniooni home directory: $HOME/.miniooni
[      0.000705] <info> Looking up OONI backends; please be patient...
[      0.002432] <info> sessionresolver: system:///... <nil>
[      0.258126] <info> sessionresolver: system:///... <nil>
[      0.502821] <info> session: using probe services: {Address:https://ps2.ooni.io Type:https Front:}
[      0.502870] <info> Looking up your location; please be patient...
[      0.502914] <info> iplookup: using ubuntu
[      0.528982] <info> sessionresolver: system:///... <nil>
[      0.663838] <info> - country: IT
[      0.663887] <info> - network: Vodafone Italia S.p.A. (AS30722)
[      0.663897] <info> - resolver's IP: 91.80.36.92
[      0.663906] <info> - resolver's network: Vodafone Italia S.p.A. (AS30722)
[      0.663959] <info> [1/1] running with input: http://xn--kxae4bafwg.xn--pxaix.gr/
[      0.664093] <info> MeasureURL url=http://xn--kxae4bafwg.xn--pxaix.gr/
[      0.665443] <info> LookupHost xn--kxae4bafwg.xn--pxaix.gr with getaddrinfo... ok
[      0.782862] <info> LookupHost xn--kxae4bafwg.xn--pxaix.gr with 8.8.4.4:53/udp... ok
[      0.944811] <info> sessionresolver: https://mozilla.cloudflare-dns.com/dns-query... <nil>
[      1.284338] <info> THClientCall http://xn--kxae4bafwg.xn--pxaix.gr/... in progress
[      4.416555] <info> THClientCall http://xn--kxae4bafwg.xn--pxaix.gr/... ok
[      4.417413] <info> TCPConnect [2001:648:2e80::44]:80... host_unreachable
[      4.551820] <info> TCPConnect 192.108.114.44:80... ok
[      4.686628] <info> GET http://xn--kxae4bafwg.xn--pxaix.gr/ with 192.108.114.44:80/tcp... ok
[      4.688526] <info> submitting measurement to OONI collector; please be patient...
[      4.890897] <info> New reportID: 20211217T144928Z_websteps_IT_30722_n1_aO60u55EwxToSlZo
[      5.242911] <info> saving measurement to disk
[      5.244223] <info> experiment: recv   0.00  byte, sent   0.00  byte
[      5.245357] <info> sessionresolver: [{"URL":"system:///","Score":1},{"URL":"https://mozilla.cloudflare-dns.com/dns-query","Score":0.9999},{"URL":"https://cloudflare-dns.com/dns-query","Score":0.999},{"URL":"https://dns.google/dns-query","Score":9.100000000000009e-17},{"URL":"http3://cloudflare-dns.com/dns-query","Score":0},{"URL":"https://dns.quad9.net/dns-query","Score":0},{"URL":"https://doh.powerdns.org/","Score":0},{"URL":"http3://mozilla.cloudflare-dns.com/dns-query","Score":0}]
[      5.245675] <info> whole session: recv   8.66 kbyte, sent   9.59 kbyte

I inspected the measurement and it seems to be okay.

xhdix commented 2 years ago

Probe android failed but:

{
  "annotations": {
    "engine_name": "ooniprobe-engine",
    "engine_version": "3.10.0-beta.3",
    "flavor": "stableFull",
    "network_type": "mobile",
    "platform": "android"
  },
  "data_format_version": "0.2.0",
  "input": "http://яндекс.рф/",
  "measurement_start_time": "2021-12-17 15:24:14",
  "probe_asn": "AS24940",
  "probe_cc": "DE",
  "probe_ip": "127.0.0.1",
  "probe_network_name": "Hetzner Online GmbH",
  "report_id": "20211217T152413Z_webconnectivity_DE_24940_n1_8uD59FPZgqD6nL3x",
  "resolver_asn": "AS13335",
  "resolver_ip": "162.158.83.147",
  "resolver_network_name": "Cloudflare, Inc.",
  "software_name": "ooniprobe-android",
  "software_version": "3.4.1",
  "test_helpers": {
    "backend": {
      "address": "https://wcth.ooni.io",
      "type": "https"
    }
  },
  "test_keys": {
    "agent": "redirect",
    "client_resolver": "162.158.83.147",
    "network_events": [
      {
        "address": "[2a02:6b8:a::a]:80",
        "failure": "unknown_failure: dial tcp [scrubbed]: connect: no route to host",
        "operation": "connect",
        "proto": "tcp",
        "t": 1.501672187,
        "tags": [
          "tcptls_experiment"
        ]
      },
      {
        "address": "77.88.55.55:80",
        "operation": "connect",
        "proto": "tcp",
        "t": 1.66578177,
        "tags": [
          "tcptls_experiment"
        ]
      },
      {
        "address": "5.255.255.5:80",
        "operation": "connect",
        "proto": "tcp",
        "t": 1.667621249,
        "tags": [
          "tcptls_experiment"
        ]
      },
      {
        "address": "5.255.255.55:80",
        "operation": "connect",
        "proto": "tcp",
        "t": 1.667557656,
        "tags": [
          "tcptls_experiment"
        ]
      },
      {
        "address": "77.88.55.66:80",
        "operation": "connect",
        "proto": "tcp",
        "t": 1.831798281,
        "tags": [
          "tcptls_experiment"
        ]
      }
    ],
    "queries": [
      {
        "answers": [
          {
            "asn": 13238,
            "as_org_name": "YANDEX LLC",
            "answer_type": "A",
            "ipv4": "77.88.55.55"
          },
          {
            "asn": 13238,
            "as_org_name": "YANDEX LLC",
            "answer_type": "A",
            "ipv4": "5.255.255.55"
          },
          {
            "asn": 13238,
            "as_org_name": "YANDEX LLC",
            "answer_type": "A",
            "ipv4": "5.255.255.5"
          },
          {
            "asn": 13238,
            "as_org_name": "YANDEX LLC",
            "answer_type": "A",
            "ipv4": "77.88.55.66"
          }
        ],
        "engine": "system",
        "hostname": "xn--d1acpjx3f.xn--p1ai",
        "query_type": "A",
        "resolver_address": "",
        "t": 0.539199739
      },
      {
        "answers": [
          {
            "asn": 13238,
            "as_org_name": "YANDEX LLC",
            "answer_type": "AAAA",
            "ipv6": "2a02:6b8:a::a"
          }
        ],
        "engine": "system",
        "hostname": "xn--d1acpjx3f.xn--p1ai",
        "query_type": "AAAA",
        "resolver_address": "",
        "t": 0.539199739
      }
    ],
    "control_failure": "unknown_failure: httpx: request failed: 400 Bad Request",
    "control": {
      "http_request": {
        "body_length": 0,
        "title": "",
        "status_code": 0
      },
      "dns": {}
    },
    "tcp_connect": [
      {
        "ip": "2a02:6b8:a::a",
        "port": 80,
        "status": {
          "failure": "unknown_failure: dial tcp [scrubbed]: connect: no route to host",
          "success": false
        },
        "t": 1.501672187
      },
      {
        "ip": "77.88.55.55",
        "port": 80,
        "status": {
          "success": true
        },
        "t": 1.66578177
      },
      {
        "ip": "5.255.255.5",
        "port": 80,
        "status": {
          "success": true
        },
        "t": 1.667621249
      },
      {
        "ip": "5.255.255.55",
        "port": 80,
        "status": {
          "success": true
        },
        "t": 1.667557656
      },
      {
        "ip": "77.88.55.66",
        "port": 80,
        "status": {
          "success": true
        },
        "t": 1.831798281
      }
    ],
    "http_experiment_failure": "unknown_failure: invalid domain in DNSCache",
    "body_proportion": 0,
    "x_status": 8
  },
  "test_name": "web_connectivity",
  "test_runtime": 1.835072291,
  "test_start_time": "2021-12-17 15:24:13",
  "test_version": "0.4.0"
}
gurshabad commented 1 year ago

Not sure if it helps, but I think I found 8 more examples of this error while analysing measurements classified as failed (specifically in Pakistan). 1, 2, 3, 4, 5, 6, 7, 8.

These were all ooniprobe-android (3.6.0). I would also say that in these instances, I think the domain name just had an upper-case character (so it may not just be an i18n matter).

bassosimone commented 1 year ago

Thank you @gurshabad, for providing additional test cases for this bug. I have repeated the above measurements as well as additional measurements using mixed case domains for Web Connectivity LTE. Here are the results I've got:

URL Experiment WAI Measurement
http://ουτοπία.δπθ.gr Web Connectivity LTE #
http://xn--kxae4bafwg.xn--pxaix.gr/ Web Connectivity LTE ✔️ #
http://ExAmPlE.com Web Connectivity 0.4 #
http://ExAmPlE.com Web Connectivity LTE ✔️ #

So, it seems that Web Connectivity LTE correctly handles punycode URLs and correctly handles mixed case domain names but does not correctly handle i18n domain names.

bassosimone commented 1 year ago

With https://github.com/ooni/probe-cli/pull/1110, we can now correctly handle http://ουτοπία.δπθ.gr with Web Connectivity LTE: 20230221T104449Z_webconnectivity_IT_30722_n1_5w3KClopMgHswxeV.

Because of that, I am going to flag this issue as fixed by Web Connectivity LTE.

bassosimone commented 1 year ago

I tested again today and I saw a tls_handshake_error (see https://explorer.ooni.org/m/20231019045842.719047_IT_webconnectivity_94d123c58ce52483). I see the host does not actually support TLS, which is a nonconclusive outcome. I think we need to find out a better test case here.

bassosimone commented 9 months ago

pRr9-V

We're going to close this issue soon, through an issue that ensures webconnectivitylte is WAI.