oxinabox / DataDepsGenerators.jl

Utility for developers to help define DataDeps registration blocks, for reusing existing Data with DataDeps.jl
Other
18 stars 6 forks source link

Implement DataDryadAPI #19

Closed SebastinSanty closed 6 years ago

SebastinSanty commented 6 years ago

I have a few points to make:

codecov-io commented 6 years ago

Codecov Report

Merging #19 into master will increase coverage by 0.4%. The diff coverage is 97.56%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master     #19     +/-   ##
========================================
+ Coverage   95.79%   96.2%   +0.4%     
========================================
  Files           6       7      +1     
  Lines         119     158     +39     
========================================
+ Hits          114     152     +38     
- Misses          5       6      +1
Impacted Files Coverage Δ
src/DataDryadWeb.jl 100% <100%> (ø)
src/DataDryadAPI.jl 100% <100%> (ø)
src/DataDepsGenerators.jl 96.29% <83.33%> (-3.71%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 883457f...babd234. Read the comment docs.

oxinabox commented 6 years ago

While I was coding up, it was becoming increasing difficult to know what is to be kept in the purview of DataOne and what is to be kept in DataDryad. My thoughts on this were to go ahead with DataDryadAPI <: DataRepo, and the when adding support for the Arctic Data Center repos (in a new PR), implement the common DataOne base, such that it will become: DataDryad <: DataOne <: DataRepo, ArcticDataCenter<: DataOne <: DataRepo

I agree this is an excellent strategy, avoiding premature (and potentially incorrect) abstraction.

Regarding citation format, I was not able to find one on the API. So I manually constructed one (which is obviously not the right way to do as stuff like journal name is missing)

It is good enough for now. Once we move on doing more stuff with DOIs directly, e.g. relating to DataCite, we can look at some of the solutions in that space that automatically generate formatted citations.

License is the same for all DataDryad repos, so I kept it as a constant. Anyways, I was not able to retrieve that from the API.

Reasonable

Bitstream urls on DataDryad which is used for downloading datasets and which we had an issue with earlier, is still not working. Hence, I am not putting the bitstream urls but instead parsing the same old way.

I will chase this up with their support

Should I be providing checksums for each download? I didn’t implement it in the DataDryadWeb, but it is now there in this PR.

I'm not sure I understand the question, AFAICT you did provide checksums for each download in DataDryadWeb https://github.com/oxinabox/DataDepsGenerators.jl/blob/883457f7944f4aa4febb8361561332c0da211a4b/test/references/DataDryad%20Drought.txt#L19-L20 In general there isn't a consistently defined notion for a checksum for multiple files. (DataDeps.jl implements one, but it is only valid for checksums it has generated in the first place)

I have a strong urge to start segregating the source/tests/references in folders for better understandability.

Sounds reasonable. Don't go too overboard

oxinabox commented 6 years ago

That is looking pretty good to me. If your done, I'll merge after appveyor completes