oxinabox / DataDeps.jl

reproducible data setup for reproducible science
Other
150 stars 43 forks source link

Use case: nested data deps? #160

Closed tpoisot closed 1 year ago

tpoisot commented 1 year ago

I'm trying to use DataDeps to come up with a way to download (when needed) a pretty large series of files. Ideally, I'd like to have them nested, i.e. MyProject/dataset1/data.csv, MyProject/dataset2/data.csv (the actual structure is more complex but this is the general idea).

When I try to use a name for the datadep with slashes, (i.e. using joinpath), it can generate the DataDep object but calling register fails:

julia> register(dataset1_experiment1_rawfiles)
ERROR: MethodError: no method matching ArgumentError(::String, ::String)
Closest candidates are:
  ArgumentError(::AbstractString) at boot.jl:325
  ArgumentError(::Any) at boot.jl:325

Is there a way to handle nested folders for the data?

oxinabox commented 1 year ago

This is not supported. I can see the utility, but it's not worth the complexity imo. Especially as this package has long ago entered long term maintance mode. The concept that a DataDep Name is a to level directory is pretty baked in.

And we already use the syntax datadep"Name/path/to/file.txt" to access files within a top level datadep called Name (and to give error messages if files are not there).

If you want to register them separately so the user doesn't need to download them all, you can do that with totally separate registered datadeps. (and you can register those with a for loop. See for example https://github.com/JuliaText/Embeddings.jl/blob/306c04bead62b32873dedbc2609c74c4ca34306b/src/glove.jl#L22)

If you don't mind them all being downloaded at once, (using good async nice stuff) you can put them in 1 datadep (and thus 1 top level directory), and you use (arbitariliy nested) vectors of URLs to download them. though that doesn't put them into the nested structure, you also would need to use post-fetch actions to mv them (those too can be arbitarily nested); i have done the mv thing a few times to match structure, it is pretty annoying, but usable.

tpoisot commented 1 year ago

Understood, thanks for the detailed rationale.