ufal / clarin-dspace

clarin-dspace digital repository based on DSpace and LINDAT/CLARIN DSpace
http://lindat.cz
BSD 3-Clause "New" or "Revised" License
27 stars 18 forks source link

Create a workflow for anonymous review of a dataset #1020

Open kosarko opened 2 years ago

kosarko commented 2 years ago

Use case: You are submitting an article describing a dataset (that you want to host in the repository). The article goes through a review process. The dataset should be available to reviewers. The authors and most other metadata should remain hidden. The reviewers might have comments (in the review platform, not in the repository) about the dataset. The author changes the dataset following the review.

Requirements:

Suggestions: Maybe fiddling with the private items feature and moving the preliminary dataset into a hidden bundle would do the trick.

stranak commented 1 year ago

A real life exmple, colleagues right now submitting a paper the the Glossa journal. From the guidelines:

Data Availability/ Supplementary Files (if applicable) The journal requires authors to make all data associated with their submission openly available, according to the FAIR principles (Findable, Accessible, Interoperable, Reusable). More information can be found on the Journal Policies page. If data/supplementary files are to be associated with the submission, one of the below options should be followed: 1) upload the files to your chosen open repository and make note of the DOI that they will provide (most suitable for datasets or information that act as foundations to the research being published. This option makes the files more findable and more citable). We recommend an open repository such as osf.io, which allows you to create a "project" under which you can upload relevant files (datasets, analysis scripts, experimental materials, etc.). The project will be associated with a unique DOI. You can then include in your manuscript a citation of the OSF entry and/or a link to the project page on OSF, to direct interested readers to the supplementary materials. During review, please be sure that the link to the repository is anonymized to maintain a fully double masked review process. Instructions for doing this on the OSF may be found here. If you'd like to learn more about best practices for ensuring reproducibility, see Laurinavichyute and Vasishth (2021). Please contact us if you would like more information or advice about hosting your data on an open repository.

In the above text there is also a very relevant link to how the OSF allows to create "view-only links" and in that dialogue asks whether to anonymize the view.

stranak commented 1 year ago

@vidiecan @kosarko Do you guys think it makes sense to implement it now, or shall we solve one or two records in this type of use case manually for now (private record, manual anonymisation, some way of view permission, e.g. "reviewer/review) and postpone this for the new version with the new UI?

vidiecan commented 1 year ago

Latest usecase requirements:

And once the item is to be published, we should show the hidden metadata, make it public, update provenance

kosarko commented 1 year ago

There are some drawbacks to the current (manual) approach, where we remove the metadata (and later add them back). Namely curation and exports. Maybe we should use the word anonymized (or similar) to indicate the value is known but redacted. The workflow should take into account especially:

stranak commented 1 year ago

There are some drawbacks to the current (manual) approach, where we remove the metadata (and later add them back). Namely curation and exports.

Completely agree that it must be "there" (filled-in already), but hidden.

Maybe we should use the word anonymized (or similar) to indicate the value is known but redacted. The workflow should take into account especially:

refbox (suggested citation) oai-pmh exports (e.g. ELG expects provider/publisher) curation tasks checking metadata completeness etc.

OR ... these features should be all simply disabled. E.g. there is no good reason to cite the dataset under review, i.e. unpublished. On the contrary, we should discourage it. The same goes for any harvesting, imho.

stranak commented 4 months ago

I have added High Priority, because we have seen the usecase several times lately and there is also a request from CU for this.