oxinabox / DataDepsGenerators.jl

Utility for developers to help define DataDeps registration blocks, for reusing existing Data with DataDeps.jl
Other
18 stars 6 forks source link

Add DataCite API #28

Closed SebastinSanty closed 6 years ago

SebastinSanty commented 6 years ago

Integration Tests to be added after your first review. Secondly, I am not able to get the urls. Do you have an idea how to get it? There are some hints regarding resource-type etc.

oxinabox commented 6 years ago

I don't think it is actually possible to get a download URL out of datacite. I kinda knew that going in. This also of-course means integration tests are not possible. (Since resolving the URL correctly is most of what we are testing with those.)

Take a look at https://github.com/datacite/freya/issues/2 where @mfenner is talking about providing it via content-negotiation for "application/zip" but it is not done yet (I believe datacite allows for content negotiation via URL as well as via header which is nice)

Right now I think our go is to provide 95% of the registration block, i.e. everything apart from the URL and checksum, then let the user go to website (The DOI's landing page) find the link manually, and then edit the generated code.

Editting the generated code is already part of our normal usage anyway, as they likely want to change the datadep name and probably edit the message.

oxinabox commented 6 years ago

Hmm what is actually going on with Figshare. Looks like they do use DataCite generated DOIs (See https://stats.datacite.org/?fq=allocator_facet%3A%22FIGSHARE+-+figshare%22&#tab-datacentres)

And the following works: https://api.datacite.org/works/10.6084/m9.figshare.5350216.v1

What does not is:

https://figshare.com/articles/_Comparison_of_SEHC_Trauma_Activation_Patients_and_SEHC_Trauma_Nonactivation_Patients_minimum_alcohol_and_illicit_drug_rates_/225779

Which is associated with the doi: 10.1371/journal.pone.0047999.t004 Which resolves to http://journals.plos.org/plosone/article/figure?id=10.1371/journal.pone.0047999.t004 which is the same table, but on a different site.

So I am guessing figshare rehosted that existing data, with its existing DOI. And so it was never issued a datacite DOI number, which means it does not work with their API.

CrossRef issued that DOI: Their API, is not so great for this https://api.crossref.org/v1/works/10.1371/journal.pone.0047999.t004

I don't think we can content negotiate anything better See https://citation.crosscite.org/docs.html I tried a few.

It might be nice to support DOIs in general via the content-negotiation method. But the things you can get out of any of the providers except DataCite seem less unuseful. (Surprising really since we're only getting basic metadata. So maybe it is just this on entry (10.1371/journal.pone.0047999.t004) that has poor metadata)

SebastinSanty commented 6 years ago

So writing down whatever I understood and plan to implement based on your points. Please correct me if I am wrong:

I faced an issue though, I tried doing content negotiation as described above. But unfortunately the content-negotiation results which came for DataCite didn't contain the source attribute. For cross-ref it is working properly.

oxinabox commented 6 years ago

So writing down whatever I understood and plan to implement based on your points. Please correct me if I am wrong:

Good idea checking. I seem to have mislead you.

29 is a separate issue. It would be to create a different generator call it DOI <: DataRepo.

Seperately from what you've made here DataCite <: DataRepo. Like how we have many ways to generate for DataDryad (DataCite, DataDryad, DataDryadWeb), a DOI generator would be an alternative. If it is a good, one (which I think it can be) it could mean that we delete the current DataCite generator just to save on maintenance.

The goal of this PR #28 is to add DataCite support, it has done that successfully (well no URL, I suspected that wasn't going to be possible). Once you fix up the the few small things discussed in the review, then this should be good to merge.

29 may or may not be the best next issue to pursue after this one.

I'ld like to see full support for Figshare and DataVerse. OAI-PMH is one path that might do it (though I suspect it also won't let use actually get download URLs)

30 will do figshare (and others) fully but not DataVerse.

BTW: cross-negotiate isn't a term that I am familiar with. I think you mean content negotiate

oxinabox commented 6 years ago

For This PR. Something I think I missed in the code-review before:

it should displace some kind of info("DataCite based generation can only generate partial registration blocks, as DataCite metadata does not (currently) include the URL to the resource. You will have to edit in the URL after generation.") And it should probably stick in the place as the URL a something like "PUT DOWNLOAD URL HERE".

Looks like the test failure are something to do with the Github generator breaking.

SebastinSanty commented 6 years ago

Its good that I asked before implementing it in the PR, saved some work of removing it.

SebastinSanty commented 6 years ago

Need to merge #31 before this.

SebastinSanty commented 6 years ago

@oxinabox Ready to merged if you don't have any reviews.

codecov-io commented 6 years ago

Codecov Report

Merging #28 into master will increase coverage by 0.15%. The diff coverage is 95.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #28      +/-   ##
==========================================
+ Coverage   93.93%   94.09%   +0.15%     
==========================================
  Files          13       14       +1     
  Lines         231      254      +23     
==========================================
+ Hits          217      239      +22     
- Misses         14       15       +1
Impacted Files Coverage Δ
src/DataDepsGenerators.jl 94.28% <100%> (+0.53%) :arrow_up:
src/DataCite.jl 95% <95%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a004321...b6ca9d8. Read the comment docs.