I typically work with data that one cannot download from a URL directly, and I cannot put them into a download URL as I don't have the rights to them. How do I add such a dataset to the DataToolkit.jl Data.toml file?
For example, I am downloading ERA5 data, which can be downloaded with a Julia script (via a PythonCall) like so:
import PythonCall
cdsapi = PythonCall.pyimport("cdsapi")
c = cdsapi.Client()
savepath = ...
config = Dict(
"product_type" => producttype,
"variable" => variables,
"year" => year
...
)
c.retrieve(data, config, savepath) # does the download
this also requires me to have the file ~/.cdsapirc saved in my computer with contents:
url: https://cds.climate.copernicus.eu/api/v2
key: 64546:.... # my private key, accessed by going into my account online and copying it
Would such an approach be possible to make reproducible with DataToolkit.jl...? I doubt it, and not due to DataToolkit.jl's fault, but mainly due to the absolutely terrible-for-reproducibility system that these data have, and in fact even worse systems are prevalent in the whole of climate science :(
I typically work with data that one cannot download from a URL directly, and I cannot put them into a download URL as I don't have the rights to them. How do I add such a dataset to the DataToolkit.jl Data.toml file?
For example, I am downloading ERA5 data, which can be downloaded with a Julia script (via a PythonCall) like so:
this also requires me to have the file
~/.cdsapirc
saved in my computer with contents:Would such an approach be possible to make reproducible with DataToolkit.jl...? I doubt it, and not due to DataToolkit.jl's fault, but mainly due to the absolutely terrible-for-reproducibility system that these data have, and in fact even worse systems are prevalent in the whole of climate science :(