ooni / backend

Everything related to OONI backend infrastructure: ooni/api, ooni/pipeline, ooni/sysadmin, collector, bouncers and test-helpers
BSD 3-Clause "New" or "Revised" License
50 stars 29 forks source link

Experiment blocking classification using machine learning #435

Open FedericoCeratto opened 4 years ago

FedericoCeratto commented 4 years ago

Attempts at manually handling fingerprints is going to become less effective as the amount and diversity of measurements increases.

FedericoCeratto commented 4 years ago

https://github.com/ooni/pipeline/pull/322

FedericoCeratto commented 4 years ago

Notes: https://docs.google.com/document/d/1Ll0zHq-sfy6ulWUVju54wNX2BNgF7OUBVCXrVah6Ev8/edit

FedericoCeratto commented 4 years ago

Status update: a prototype implemented on CatBoost fetches data from Clickhouse. It learns to predict the value of the "status" column from the columns: "report_id, input, probe_cc, probe_asn, test_name, platform, control_failure, is_ssl_expected, page_len, page_len_ratio, server_cc, server_asn, server_as_name"

It then run predictions and sorts the output by certainty and shows the ones where ML and the fastpath disagree or where ML is less certain. It seems to easily spot broken tests, bugs in the msmt scoring and cases where the scoring is not smart enough.