ooni / backend

Everything related to OONI backend infrastructure: ooni/api, ooni/pipeline, ooni/sysadmin, collector, bouncers and test-helpers
BSD 3-Clause "New" or "Revised" License
50 stars 29 forks source link

Add support for ingesting Mlab NDT data from bigquery #140

Open hellais opened 5 years ago

hellais commented 5 years ago

Currently the table used to populate the performance related API endpoints was done as a one-off sketchy script.

We should ideally do this as a daily pipeline batch operation which takes the metrics from bigquery using a query like:

#legacySQL
SELECT client_asn_number,
       client_asn_name,
       client_country_code,
       SUM(rtt_sum) / SUM(rtt_count) AS rtt_avg,
       AVG(packet_retransmit_rate) AS retransmit_avg,
       nth(51, quantiles(download_speed_mbps, 101)) AS download_speed_mbps_median,
       nth(51, quantiles(upload_speed_mbps, 101)) AS upload_speed_mbps_median,
       COUNT(*) AS count
FROM [mlab-oti.data_viz.all_ip_by_hour]
WHERE LENGTH(client_asn_name) > 0
 AND LENGTH(client_asn_number) > 0
 AND local_test_date >= '2018-03-01 00:00:00' 
 AND local_test_date < '2018-04-01 00:00:00'
GROUP BY client_asn_number,
client_asn_name,
client_country_code
ORDER BY client_country_code;

Other useful links: https://github.com/m-lab/mlab-vis-pipeline/tree/master/dataflow/data/bigtable/queries https://data-api-dot-mlab-sandbox.appspot.com/locations/euit/clients/AS30722/metrics?startdate=2016-03&enddate=2019-05&format=json&download=1&timebin=month https://github.com/m-lab/mlab-vis-pipeline/blob/beam-api-upgrade/dataflow/data/bigquery/queries/base_uploads_ip_by_day.sql & https://github.com/m-lab/mlab-vis-pipeline/blob/beam-api-upgrade/dataflow/data/bigquery/queries/base_uploads_ip_by_day.sql

hellais commented 4 years ago

This is related to: https://github.com/ooni/pipeline/issues/78

hellais commented 4 years ago

This might also be relevant to this epic: https://github.com/ooni/ooni.org/issues/594, since this would allow us to rebuild one of the tables that are currently in hkgmetadb and not part of any pipeline.