Open oxinabox opened 6 years ago
Maybe implement some sort of recursive download for github based repos (or any folder based format for that matter) in DataDeps.jl?
Maybe yes. like some kind of (opt-in?) post processing that tries to generate MetaData for the URLS that are being downloaded, and then take the URLs from that or something.
Since the Github generator has the files right but inferior metadata on creator etc.
julia> generate(GitHub(), "https://github.com/choldgraf/blog-documentation_questionnaire") |> println
register(DataDep(
"blog-documentation_questionnaire",
"""
Dataset: blog-documentation_questionnaire
Website: https://github.com/choldgraf/blog-documentation_questionnaire
License: Unknown
# blog-documentation_questionnaire
A public repository for data + analyses for a blog post on documentation
""",
Any[Any["https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/data/contribs.csv", "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/data/credit_enjoyment.csv"], Any["https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/figures/plot_contrib_type_bar.png", "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/figures/plot_credit_enjoyment.png", "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/figures/plot_diff_hist.png", "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/figures/plot_docs_diff_compare.png", "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/figures/plot_docs_usual_should.png"], "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/.gitignore", "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/README.md", "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/analysis.py", "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/plot_figs.py"],
))
While I remember FigShare is actually breaking the spec. as per https://schema.org/DataDownload .
contentUrl
is only for linking to "Actual bytes of the media object"
They should be using url
or mainEntityOfPage
When linking to external sites like that.
This is a pathological case: http://doi.org/10.6084/m9.figshare.5557801.v1 It is a Document on Figshare with an external file
I do not think this is worth fixing any time soon. It is a fairly rare corner case. And fiddly to fix.
I am just noting it down for record keeping
Wrong Outputs:
Figshare generator:
Broken hash, and the URL does not point to a downloadable file.
JSONLD_Web generator
URL wrong, still
(normal) incomplete outputs
DataCite Generator
This is actually as good as DataCite ever is.
JSON_DOI
This is fine, just like DataCite it is as usual missing URLs.