Should we include data from journal supplementary files?

ropensci-archive / doidata

:no_entry: ARCHIVED :no_entry:

MIT License

18 stars 2 forks source link

Should we include data from journal supplementary files? #2

Open noamross opened 6 years ago

noamross commented 6 years ago

@sckott mentioned this in https://github.com/datacite/freya/issues/2

Pros:

A lot of data is stored this way. It greatly expands the range of data that would be made available to users and the degree to which the package could improve data linkage

Cons:

We may want to discourage using supplementary data in journals to store data. OTOH we really don't have much influence through this tool.
It would be a lot harder to do this client-side than it would be for data repositories. Repositories are limited in number, so client-side mapping of DOI to resource would only require so much custom coding. For most of the we can identify the repository, and thus the mapping, from the DOI. There are many more journals, and journals themselves aren't the relevant unit - we need to understand how DOI --> file URL maps for each publisher's platform.

sckott commented 6 years ago

(p.s. fulltext has https://github.com/ropensci/fulltext/#supplementary-materials via Will Pearse - but an argument can be made to pull that functionality out of the pkg into another [here or elsewhere])

sckott commented 6 years ago

journals themselves aren't the relevant unit

I'd think DOI prefix owners (often == publisher) are the relevant units

mfenner commented 6 years ago

Figshare is hosting many (> 100k) supplementary files for publishers, so there are a lot of DataCite DOIs and metadata available for them. To take one example from today: https://doi.org/10.6084/m9.figshare.5752965.v1 is the DOI for a supplementary file to https://doi.org/10.1159/000485227 (a Karger prefix, DOI not live yet).

charliejhadley commented 6 years ago

Hello folks!

I've got a comment about the weirdness of publishers who use "Figshare for publishers" like PLOS ONE.

Take this article for instance: https://doi.org/10.1371/journal.pone.0198684

Query the Figshare API for the collection ID: (4126502)

https://api.figshare.com/v2/collections?doi=10.1371%2Fjournal.pone.0198684

Return all assets from the collection:

https://api.figshare.com/v2/collections/4126502

These assets include the actual paper itself, and all figures and tables included in the paper. This is tremendously useful!

BUT

This does not return the "supporting information" file https://doi.org/10.1371/journal.pone.0198684.s001

Summary

As a user of the doidata package, I would appreciate a method for accessing ALL of these assets from a paper when the publisher uses Figshare behind the scenes.

nuest commented 5 years ago

@martinjhnhadley Do you know the package suppdata?

The suggestion by @sckott (https://github.com/ropenscilabs/doidata/issues/2#issuecomment-355088644) is realised in that package, i.e. the DOI-based download from the fullext package is it's own package now: https://github.com/ropensci/suppdata

We're planning to have a hackathon as part of the Mozilla Global Sprint (https://github.com/ropensci/suppdata/issues/35) around the suppdata package. Maybe that is a good occasion to revive doidata ?

charliejhadley commented 5 years ago

Thanks @nuest! I wasn't aware of the suppdata package, it looks like this will definitely solve some of my requirements. I'm not sure if my skills are developed enough to usefully participate in the devel of doidata, but I'll register and give it a go.