oxinabox / DataDepsGenerators.jl

Utility for developers to help define DataDeps registration blocks, for reusing existing Data with DataDeps.jl
Other
18 stars 6 forks source link

Make DataOne+DataDryad bitstreams work #11

Open oxinabox opened 6 years ago

oxinabox commented 6 years ago

@oxinabox [ 1:40pm] So the task here, really is to make:

generate(DataDryad(), "https://datadryad.org/resource/doi:10.5061/dryad.74699")

return something like the string: @oxinabox [1:47 PM]

    "doi:10.5061 dryad.74699", 
    """
    Dataset:  Data from: Ecology and genomics of an important crop wild relative as a prelude to agricultural innovation
    Author: von Wettberg et. al.
    License: http://creativecommons.org/publicdomain/zero/1.0/
    Date: 2018-02-27T21:46:39Z
    Website: https://doi.org/10.5061/dryad.74699

Domesticated species are impacted in unintended ways during domestication and breeding. Changes in the nature and intensity of selection impart genetic drift, reduce diversity, and increase the frequency of deleterious alleles. Such outcomes constrain our ability to expand the cultivation of crops into environments that differ from those under which domestication occurred. We address this need in chickpea, an important pulse legume, by harnessing the diversity of wild crop relatives. We document an extreme domestication-related genetic bottleneck and decipher the genetic history of wild populations. We provide evidence of ancestral adaptations for seed coat color crypsis, estimate the impact of environment on genetic structure and trait values, and demonstrate variation between wild and cultivated accessions for agronomic properties. A resource of genotyped, association mapping progeny functionally links the wild and cultivated gene pools and is an essential resource chickpea for improvement, while our methods inform collection of other wild crop progenitor species.   

Please cite the paper:
https://doi.org/10.1038/s41467-018-02867-z
as well as this dataset:
https://doi.org/10.5061/dryad.74699
if you use this in your research.
    """,
  ["https://datadryad.org/mn/object/http://dx.doi.org/10.5061/dryad.8790/1/bitstream"],
[(md5, "bc96dba38d8659f42527cc22ff7f6e3b")]

(escaping required)

@SebastinSanty [1:50 PM] Great. I’ll try to produce such a result. I assume the description is to be parsed from the abstract given in: https://datadryad.org/resource/doi:10.5061/dryad.74699

@oxinabox [1:51 PM] No, it is to be parsed from: https://datadryad.org/mn/object/doi:10.5061/dryad.74699 which should be much easier

@SebastinSanty [1:51 PM] Yes, my bad

@oxinabox [1:52 PM] As an aside, its not well documented, But the easiest place to find the checksum is in https://datadryad.org/mn/checksum/doi:10.5061/dryad.74699/1 (edited) which is a bit more parsable that from https://datadryad.org/mn/object/doi:10.5061/dryad.74699/1 (edited) alternatively from https://datadryad.org/mn/meta/doi:10.5061/dryad.74699/1

@SebastinSanty [1:54 PM] Got it

@oxinabox [1:56 PM] I'm going to copy paste this discussion into an issue on Github. Because Slack discussion vanishes every 2-4 days

@oxinabox [2:04 PM] For now we'll ignore the fact that DataDryads bitstreams are broken, I have opened a ticket with them: https://nescent.manuscript.com/default.asp?36981_9ea8ha7te98ohqda

I think, from discussion with @amoeba (which I may have misinterpreted) in the DataOneSlack we can better leverage the common API by using a content node like: https://cn.dataone.org/cn/v2/resolve/{PID} but for now lets not worry about that, get some code out, other things can happen after.

oxinabox commented 6 years ago

Is this still broken?

SebastinSanty commented 6 years ago

Yes. We are parsing the webpage for the links.

oxinabox commented 6 years ago

I just checked and it is still broken indeed.