ooni / probe

OONI Probe network measurement tool for detecting internet censorship
https://ooni.org/install
BSD 3-Clause "New" or "Revised" License
750 stars 142 forks source link

engine: map more unknown_failure to a specified failure string #2420

Open hellais opened 1 year ago

hellais commented 1 year ago

While doing some data analysis work I enumerated all the unknown failures found in the OONI dataset since Jan 1st 2022 and produced this list: https://gist.github.com/hellais/8b38d360c169e5a2a8ce8856bcf519ff.

I think it would be very useful to have some specific error strings defined for some of these so that we can use them as part of the analysis more easily.

I think the TLS related ones are especially interesting.

Several of these have been consolidated into a single failure string and replaced the variable part with [token] to make it easier to look at.

Here is the snipped used for doing that mapping:

import re

lookup_re = re.compile(r'(lookup [a-zA-Z0-9\.\-]*)')

def consolidate_failure(f):
    if f.startswith("unknown_failure: net/http: HTTP/1.x transport connection broken: malformed HTTP response"):
        return "unknown_failure: net/http: HTTP/1.x transport connection broken: malformed HTTP response [garbage]"
    if f.startswith("unknown_failure: net/http: HTTP/1.x transport connection broken: malformed HTTP status code"):
        return "unknown_failure: net/http: HTTP/1.x transport connection broken: malformed HTTP status code [garbage]"
    if f.startswith("unknown_failure: tls: oversized record received with length"):
        return "unknown_failure: tls: oversized record received with length [length]"
    if f.startswith("unknown_failure: net/http: HTTP/1.x transport connection broken: malformed MIME header: missing colon:"):
        return "unknown_failure: net/http: HTTP/1.x transport connection broken: malformed MIME header: missing colon: [garbage]"
    if f.startswith("unknown_failure: x509: certificate is valid for "):
        return "unknown_failure: x509: certificate is valid for [domain_1], [domain_2] not [domain_3]"
    return lookup_re.sub('lookup [domain_name]', f)

This list is a superset of what has been reported in these issues: https://github.com/ooni/probe/issues/2412 https://github.com/ooni/probe/issues/2411 https://github.com/ooni/probe/issues/2410

ohnorobo commented 1 year ago

CensoredPlanet also had to do a similar analysis. The exact mapping will depend on the details of the networking library/stack used (in CensoredPlanet's case it's https://pkg.go.dev/net), but you may find the correspondences we ended up with useful, especially for the TLS/HTTP specific ones (some are specific to CensoredPlanet's internal infrastructure and you should ignore them):