Open hellais opened 4 years ago
Maybe a good concrete use-case for this could be to add an endpoint in the API which generates a CSV file which is compatible with my tableau based workflow for doing data analysis.
Here is a sample notebook which does this: https://gist.github.com/hellais/24054799ab6dea3913855bce1118691a.
The key is the query which takes out the blocking
column from the http_verdict
table and reshapes the result in such a way that it can easily be plotted and displayed in tableau by pivoting around the blocking
column (i.e. the value becomes a column), like so:
pivot = web_dfs[['probe_cc', 'probe_asn', 'measurement_start_time', 'blocking', 'input', 'domain', 'report_id']]
pivot.loc[:,'count'] = 1
pivot.loc[:, 'explorer_url'] = pivot.apply(get_explorer_url, axis=1)
pivot = pivot.pivot_table(
index=['measurement_start_time', 'probe_asn', 'input', 'domain', 'explorer_url', 'probe_cc'], columns='blocking', values='count'
).fillna(0).reset_index()
With this it would drastically simplify our current data analysis workflow and it could potentially allow third parties to use it as well.
The load of these queries is very high, so we probably want to put this endpoint behind some sort of authentication. In the beginning I would say it's adequate to have just a simple static API key (or set of them?), but then once we have implemented https://github.com/ooni/explorer/issues/388 (and/or https://github.com/ooni/ooni.org/issues/434), we could then use that system.
We will make it possible to integrate website-related metrics into other tooling with some machine-to-machine data export format.
This will allow people like BBC or DW to download a dump of aggregated OONI measurements pertaining to potential blocking of their web assets.