tecosaur / DataToolkit.jl

Reproducible, flexible, and convenient data management
https://tecosaur.github.io/DataToolkit.jl
78 stars 4 forks source link

Adding a dataset that can only downloaded with credentials #45

Open Datseris opened 3 weeks ago

Datseris commented 3 weeks ago

I typically work with data that one cannot download from a URL directly, and I cannot put them into a download URL as I don't have the rights to them. How do I add such a dataset to the DataToolkit.jl Data.toml file?

For example, I am downloading ERA5 data, which can be downloaded with a Julia script (via a PythonCall) like so:

import PythonCall
cdsapi = PythonCall.pyimport("cdsapi")

c = cdsapi.Client()
savepath = ...
config = Dict(
    "product_type" => producttype,
    "variable" => variables,
    "year" => year
...
)
c.retrieve(data, config, savepath) # does the download

this also requires me to have the file ~/.cdsapirc saved in my computer with contents:

url: https://cds.climate.copernicus.eu/api/v2
key: 64546:.... # my private key, accessed by going into my account online and copying it

Would such an approach be possible to make reproducible with DataToolkit.jl...? I doubt it, and not due to DataToolkit.jl's fault, but mainly due to the absolutely terrible-for-reproducibility system that these data have, and in fact even worse systems are prevalent in the whole of climate science :(