Open alisonboyer opened 7 years ago
I remember this being a discussion subject in the slack recently (where to publish data). It involved @noamross and @mbjones
@alisonboyer good idea!
there's an approach to this that exists now but with data associated with papers deposited in places like Dataone, KNB, Dryad, etc. - and those venues have RESTful interfaces to them, and some at least have R pkgs.
Or do you like an integrated approach where a journal hosts the data as well?
I'd really love to see a peer-reviewed venue for publishing data
i guess there are some (Scientific Data) but none that I know of built on modern technology with webservices, etc
The way I see it there is a big gap in the existing solutions. Scientific Data does not publish/store/archive the data. When you publish a data descriptor paper there, you have to place your data in a separate repo (I work at one of them). On the other hand, DataONE, KNB, Dryad, etc are not peer reviewed.
Great suggestion, I'd also love to hear more discussion on this topic.
Re Nature's Scientific Data: Of course most journals that publish 'software papers' don't publish/store/archive the software either; and I think we agree it wouldn't make much sense for them to do so. Like Scott says, the best data repositories already have rich metadata models, RESTful interfaces, and often R package access.
On the other hand, I think we've hit on a pretty interesting model with, say, the ropensci onboarding process that might have some relevance here. IMHO, archiving data properly shares a lot of similar complexity of process, technical jargon, and siloed community norms that publishing an R package has. I think our onboarding process has been successful in helping people navigate both some of the technical and cultural norms/best practices/expectations that I suspect not only make it easier for an author to deal with passing the complexity/hurdles of R CMD check
and CRAN review, but also usually make the package better in ways automated checks cannot.
In analogy, data submission can involve a similarly complex web of technical and cultural hurdles, e.g. see the process outlined by:
Now of course data publishing need not be this complex (just as getting software out on github or whatnot is simpler than CRAN checks & onboarding), but I for one would argue it adds value to the product; and like making an R package, can be made simpler through a combination of more intuitive tooling, shared norms, and community. Perhaps some peer-review / on-boarding like process could help?
Of course this is very different from the Scientific Data model, just as a traditional software paper differs from onboarding, in that in the former cases there's usually a very different incentive motivation at play which involves the values of stuff like citations & impact factors perhaps far more than it does the value of the peer review itself. (Many of the review requirements at Scientific Data are actually things that are accomplished by automatic checks and validation built into data archives like KNB).
Is anyone interested in discussing how R can interface with existing data journals, or starting a data journal? I'd really love to see a peer-reviewed venue for publishing data, a RESTful interface to the data, and of course an R package to access the data & metadata.