Experiment blocking classification using machine learning

FedericoCeratto commented 4 years ago

Attempts at manually handling fingerprints is going to become less effective as the amount and diversity of measurements increases.

[x] Use fastpath to extract features from measurements into new columns (either in ClickHouse or Postgres)
[x] Experiment with ML classification to detect how useful the new features are to detect blocking. Investigate improving detection using ML.

FedericoCeratto commented 4 years ago

https://github.com/ooni/pipeline/pull/322

FedericoCeratto commented 4 years ago

Notes: https://docs.google.com/document/d/1Ll0zHq-sfy6ulWUVju54wNX2BNgF7OUBVCXrVah6Ev8/edit

FedericoCeratto commented 4 years ago

Status update: a prototype implemented on CatBoost fetches data from Clickhouse. It learns to predict the value of the "status" column from the columns: "report_id, input, probe_cc, probe_asn, test_name, platform, control_failure, is_ssl_expected, page_len, page_len_ratio, server_cc, server_asn, server_as_name"

It then run predictions and sorts the output by certainty and shows the ones where ML and the fastpath disagree or where ML is less certain. It seems to easily spot broken tests, bugs in the msmt scoring and cases where the scoring is not smart enough.

ooni / backend

Experiment blocking classification using machine learning #435