Peer Review of Open Research Data

InquisitiveVi commented 6 years ago

Confused? New to Github? Visit the GitHub help page on our site for more information!

[//]: # "======================= Even if you know Github well, we suggest you read this. Anything between these lines you can leave or delete, as they won't display anyway when you post (you can check this via Preview changes). They're here to help you complete issues quickly and in a way that will help other participants. If you're posting a new project, or challenge. We suggest you fill out the Google Forms first. ============================"

At a glance

[//]: # "======================= Please paste the metadata you received after submitting your project or challenge in your Google Form exactly as we sent it to you. You can delete what's there now, it's just there ============================"

Submission Name: Peer Review of Open Research Data
Contact Lead: @InquisitiveVi Twitter
Region: #Global
Issue Area: #OpenData
Issue Type: #Challenge

Description

[//]: # "======================= Insert a paragraph providing more context for your project or challenge focuses on. For project leads, this is a good place to give some broader context about your project—beyond the scope of the do-a-thon. If you're posting a challenge, this is a good chance to say how the problem arise or why it feels relevant to you. ============================" While FAIR principles guide sharing of research data, are there generic attributes that assists decision on the quality of research data shared ? Should research data be peer reviewed after sharing or before collection ? These were the questions we discussed during unconference session. Further we wanted to explore the existing best practices or novel solutions on the peer review of data. This challenge is to document other ideas from OpenCon alumn and community members.

What are we working on during the do-a-thon? What kinds of support do we need?

[//]: # "======================= For those leading projects, please give some more information about what type of support you are specifically looking to get done during the do-a-thon day. Note: Challenge leads will not need to fill out this section and can remove it. ============================"

How can others contribute?

[//]: # "======================= Please say what the best way to contribute to the project or challenge is, sometimes that will just be 'lets discuss here' or 'Ive started a Google doc'. If you are a challenge lead, give some context on what design thinking tools you will be using, and how other folks can update their ideas onto the thread. If you are a project lead, and you already have clear ways people can contribute it might be worth linking to them here. Language: If your project is regionally based in a non-English-speaking region, clarify here what language you and contributors will primarily be communicating in. If you're leading a project or challenge participating remotely: Use this space to let people know that this is a remote project and that you are not 'in the room' in Berlin. Let other participants know what the best way to get in touch with you, where the work will happen, and where any updates or outputs will go. If you are at the in-person meeting in Berlin: Be as inclusive as possible to those outside the room. Use this space to give clear instructions to those participating in the do-a-thon remotely on how they can keep up to date and contribute. ============================"

[//]: # "======================= You're ready to post!!! After posting your issue, the real work begins. Next, you might want to: Tweet a link to this issue with #opencon so others can join in Make another issue to involve people in your work - remember to use your metadata Come back from time to time and update the community on your project. You'll get an email update whenever someone interacts with your issue. ============================"

This post is part of the OpenCon 2017 Do-A-Thon. Not sure what's going on? Head here.

Daniel-Mietchen commented 6 years ago

For cases where the data is to be reviewed in the context of a manuscript, there are some guidelines here.

Daniel-Mietchen commented 6 years ago

For cases where the dataset is dynamically changing, extra care is needed. A good example of what can be done to facilitate such reviews is here. This basically takes a SPARQL query

SELECT DISTINCT ?q WHERE {
  ?p wdt:P50 ?q;
     wdt:P31 wd:Q13442814 .
  ?q wdt:P21 wd:Q6581072.
}

and some timestamps and provides a list of changes that have been made to Wikidata items about female authors of scientific articles.

InquisitiveVi commented 6 years ago

Thank you @Daniel-Mietchen ! I am linking the notes from our unconference session and tagging @chartgerink for feedback. https://docs.google.com/document/d/1DlTOMafXdt2Hgu5A2PiIOdGX4QJ7bRNvyzRZMQSDYVE/edit#heading=h.k44ivrk1hjtt

chartgerink commented 6 years ago

Thanks @InquisitiveVi for the tag!

My main thing here is at what stage?

Data Management Plans (DMPs) are sort of data reviews pertaining to structure, handling, and storage. 2. Another form of review could be whether the resulting data is what was said was going to be collected (verification of the structure).
Another form could be whether the results are reproducible from the provided data.
Another would be whether the data presented in a manuscript are internally consistent (e.g., with statistical checking tools such as statcheck
What would also be possible is open data review, to indicate FAIR (as indicated in document). Especially whether people can reuse it without making leaps. I've tried to reuse several open data sets that lacked documentation and the authors did receive an "Open Data Badge" (it was clear to the authors I guess)

Sorry if that's incoherent, just dumping some initial thoughts. I think data review is worthwhile, just like code review is valuable. There are many stages at which it could occur though, and where would it be actionable at this moment? I think the focus would be on 5 right now, but I do think in an ideal setting it would be all of these plus more 🔥

InquisitiveVi commented 6 years ago

Thank you @chartgerink for your thoughts on this. Verification of data structure after initial collection will be very useful but we need to also think about what will encourage the reviewers to get involved and how will data collecting researcher(s) or their team(s) be protected against the regular fear of being scooped. Reuse as a criteria with clear documentation can be a great incentive. Renga platform from Swiss Data Science Centre can be one way to get around reuse of data and workflows https://datascience.ch/renga-platform/

sparcopen / doathon