Our mission is to make the data we need to make progress on the world’s biggest problems accessible. For most people, accessibility means working with data tools – but getting data from our website to those tools is currently quite annoying.
It’s currently hard to quickly reuse data from a chart in an analytics environment. This is because the current CSV download is created in the browser, making it impossible to offer a single-line code snippet for analytics environments.
Solution
We want to create the ability to download the data for a chart. Technically, this will re-use the infrastructure that we created to render dynamic thumbnails to render a CSV.
Additionally, we want to offer two download options:
the full data download (the input table in grapher)
and the data as it is used for the current visualization (i.e. filtered to the visible date range, taking the current selection into account, maybe taking other query params into account as well)
These two options should be presented in the download tab as a toggle UI element. In the CF function that serves this this should be a query param - the default should probably be the full data.
We also want to have two options for the column names:
the default should be to use the verbose long names of the indicators
with a flag it should be able to try and use the columnShortName if available. This was introduced with the etl and is supposed to be a column name that is easier to work with in code, if maybe a bit harder to read for a human
In the download tab, in the data section, we want to show the “data source” line from the grapher screen so that people know who created the data and also add something like “learn more about this data” that links you to the sources tab so you can see the citation info etc
The download button should download data form the csv url as well instead of creating the file client side.
Under the download button we should add code snippets for how to get this data into the following tools:
Google sheets/Excel using importdata()
Python pandas
R
Excel (xlsx file if we build it)
The URL to fetch the data in the code snippets should reflect the choice the user made in terms of full data download or only the data relevant for the current view.
Must have
[x] a cf worker that serves the csv data - in full (input table)
[x] a cf worker that serves the csv data - only the visible subset (transformedTable)
[x] a way to switch between long verbose column names and short names
[x] a cf worker that serves the metadata (a sanitized subset of the metadata, roughly what the sources panel shows)
[ ] a cf worker that serves a zip file of both the csv and the metadata and a readme. The readme should explain what is in the zip file and contain roughly the information of the sources tab
[ ] a new download tab section that surfaces the above options
[ ] code samples in the download tab to show how to access the data above in different languages
[ ] Only data that we can re-share must be accessible this way (i.e. obey the is_protected flag)
Can have
[ ] a cf worker that serves an xlsx file with three tabs (data, metadata, "readme")
[ ] store download counters per chart and file type (maybe in D1? Or can we add them to GA? Should we try to add per country?)
Checklist before publishing
[ ] Verify that csv download is rejected if nonRedistributable is set
[x] Check the filtered csv for all chart types and see if it makes sense
[ ] Check why some chart types like scatters have "time" multiple times but with always just "time" as the column name
[ ] Check that filtered csv with tolerance looks ok
[x] Check that filtered csv with day as year works ok (consider outputting days in ISO format)
[x] Make sure CORS are handled correctly
[ ] Ensure that csv based explorers don't claim to support the new download option
[ ] Figure out why data pages show the wrong url on staging servers that are deployed to CF
Context
Our mission is to make the data we need to make progress on the world’s biggest problems accessible. For most people, accessibility means working with data tools – but getting data from our website to those tools is currently quite annoying.
It’s currently hard to quickly reuse data from a chart in an analytics environment. This is because the current CSV download is created in the browser, making it impossible to offer a single-line code snippet for analytics environments.
Solution
We want to create the ability to download the data for a chart. Technically, this will re-use the infrastructure that we created to render dynamic thumbnails to render a CSV.
The URL scheme should be https://ourworldindata.org/grapher/life-expectancy.csv - for the data https://ourworldindata.org/grapher/life-expectancy.metadata.json - for the metadata https://ourworldindata.org/grapher/life-expectancy.zip - for a zip file of csv, metadata.json and a README.md
and maybe https://ourworldindata.org/grapher/life-expectancy.xlsx - for an excel file with 3 sheets, one with description text, one with metadata and one with the data
Additionally, we want to offer two download options:
These two options should be presented in the download tab as a toggle UI element. In the CF function that serves this this should be a query param - the default should probably be the full data.
We also want to have two options for the column names:
In the download tab, in the data section, we want to show the “data source” line from the grapher screen so that people know who created the data and also add something like “learn more about this data” that links you to the sources tab so you can see the citation info etc
The download button should download data form the csv url as well instead of creating the file client side.
Under the download button we should add code snippets for how to get this data into the following tools:
The URL to fetch the data in the code snippets should reflect the choice the user made in terms of full data download or only the data relevant for the current view.
Must have
Can have
Checklist before publishing