Closed larsyencken closed 4 months ago
This issue came up whilst trying to test a new chart-based API for data downloads, where we would like to throw a clean exception for the non-redistributable case.
@ikesau confirmed that Grapher respects it if the API indicates nonRedistributable = true
, making this an API/ETL issue.
Turns out this was a false alarm due to a misunderstanding of the meaning of something being "private".
In the ETL, private means that the general public cannot access those files, except when they are published as indicators in the grapher://
step. At that stage, anything private should be marked as nonRedistributable
in the metadata.
In Grapher, datasets marked as !isPrivate && !nonRedistributable
are automatically re-published to Github. If something is !nonRedistributable
, it means CSV download is available with Grapher.
This means !isPrivate
should probably be renamed publishToGithub
, and it should be false
any time nonDistributable
is true
.
UPDATE: this was a false alarm due to a misunderstanding of
isPrivate
on Grapherdatasets
modelContext
For some data sources, we are allowed to share charts of the data to the public, but not to allow the public to directly download the data from us. This is because the upstream data provider does not allow redistribution.
Whilst this might seem restrictive, the data provider is still doing a great public good in allowing us to visualise and write about the data for a broad audience, and we in turn are helping to bring their work to a broad audience.
Problem
It appears that data download restrictions are not honoured by the Grapher download overlay. For example:
After debugging, the issue comes the ETL not generating the right metadata into our data API.
Expected behaviour
Instead of the CSV download option, we should have a note saying that the data provider disallows redistribution, or better a link back to the original data provider so that you can get the data from them.
Technical notes
owid@automation-1
, filenamedata-api.db
select count(*) from metadata where json_extract(metadata, '$.nonRedistributable');
gives 307 non redistributable variablesselect count(*) from variables v inner join datasets d on (v.datasetId = d.id) where d.isPrivate;
on MySQL gives 101k variables that are meant to be non-redistributableNote on priority
If confirmed, this is high priority to fix, since it's essential that we respect the work of data providers and the restrictions they put on that.