ooni / backend

Everything related to OONI backend infrastructure: ooni/api, ooni/pipeline, ooni/sysadmin, collector, bouncers and test-helpers
BSD 3-Clause "New" or "Revised" License
49 stars 29 forks source link

pipeline: correctly flag websites that are down #633

Open bassosimone opened 1 year ago

bassosimone commented 1 year ago

This issue is about improving the pipeline to correctly flag websites that are down. More specifically, this issue is about making sure the ETL pipeline recognizes all the states that are already recognized by OONI Explorer's parsing code.

To illustrate this issue, let us focus on a specific case.

Since https://github.com/ooni/probe-cli/pull/953, webconnectivity LTE correctly detects cases where neither the probe nor the test helper resolved any IP address. We flag those cases as "website down". See, for example, this measurement: https://explorer.ooni.org/measurement/20220912T050804Z_webconnectivity_IT_30722_n1_DWMjAQ9rHm7ho2HT?input=http%3A%2F%2Fwww.hrcr.org%2F.

Screenshot 2023-02-01 at 10 53 35

The ETL pipeline scores the same measurement as follows:

{
  "anomaly": false,
  "category_code": "HUMR",
  "confirmed": false,
  "failure": false,
  "input": "http://www.hrcr.org/",
  "measurement_start_time": "2022-09-12T05:08:04Z",
  "measurement_uid": "20220912050804.879095_IT_webconnectivity_d368de710a966370",
  "probe_asn": 30722,
  "probe_cc": "IT",
  "report_id": "20220912T050804Z_webconnectivity_IT_30722_n1_DWMjAQ9rHm7ho2HT",
  "scores": "{\"blocking_general\":0.0,\"blocking_global\":0.0,\"blocking_country\":0.0,\"blocking_isp\":0.0,\"blocking_local\":0.0}",
  "test_name": "web_connectivity",
  "test_start_time": "2022-09-12T05:08:03Z"
}

(See https://api.ooni.io/api/v1/measurement_meta?report_id=20220912T050804Z_webconnectivity_IT_30722_n1_DWMjAQ9rHm7ho2HT&input=http%3A%2F%2Fwww.hrcr.org%2F)

Because of how the ETL pipeline handles this measurement, the search in Explorer has no way to indicate that the measurement actually tells us that the website is down. When we see the individual measurement, instead, Explorer correctly recognizes this case because it has a more comprehensive parsing of the measurement results.

hellais commented 11 months ago

Not sure if we should implement this in the current fastpath or if we just wait for ooni/data rollout which already includes a taxonomy for this.