ooni / backend

Everything related to OONI backend infrastructure: ooni/api, ooni/pipeline, ooni/sysadmin, collector, bouncers and test-helpers
BSD 3-Clause "New" or "Revised" License
50 stars 29 forks source link

Add data export capabilities for website-related metrics #331

Open hellais opened 4 years ago

hellais commented 4 years ago

We will make it possible to integrate website-related metrics into other tooling with some machine-to-machine data export format.

This will allow people like BBC or DW to download a dump of aggregated OONI measurements pertaining to potential blocking of their web assets.

hellais commented 4 years ago

Maybe a good concrete use-case for this could be to add an endpoint in the API which generates a CSV file which is compatible with my tableau based workflow for doing data analysis.

Here is a sample notebook which does this: https://gist.github.com/hellais/24054799ab6dea3913855bce1118691a.

The key is the query which takes out the blocking column from the http_verdict table and reshapes the result in such a way that it can easily be plotted and displayed in tableau by pivoting around the blocking column (i.e. the value becomes a column), like so:

pivot = web_dfs[['probe_cc', 'probe_asn', 'measurement_start_time', 'blocking', 'input', 'domain', 'report_id']]
pivot.loc[:,'count'] = 1
pivot.loc[:, 'explorer_url'] = pivot.apply(get_explorer_url, axis=1)
pivot = pivot.pivot_table(
    index=['measurement_start_time', 'probe_asn', 'input', 'domain', 'explorer_url', 'probe_cc'], columns='blocking', values='count'
).fillna(0).reset_index()

With this it would drastically simplify our current data analysis workflow and it could potentially allow third parties to use it as well.

The load of these queries is very high, so we probably want to put this endpoint behind some sort of authentication. In the beginning I would say it's adequate to have just a simple static API key (or set of them?), but then once we have implemented https://github.com/ooni/explorer/issues/388 (and/or https://github.com/ooni/ooni.org/issues/434), we could then use that system.