Dataverse integration - Githubissues

pdurbin commented 4 years ago

Hi! I'm here because of https://twitter.com/GenevievMichaud/status/1186933912255291392 and DMap looks interesting! Let's talk about the possibility of integration with Dataverse, possibly through through an external tool! Please see http://guides.dataverse.org/en/4.17/api/external-tools.html

Also, I'd be remiss if I didn't point out that it should be "Dataverse" instead of "DataVerse" like in the screenshot below:

Thanks!

oblassers commented 4 years ago

Dear Philip,

thank you for your interest in DMap and my apologies for the late response! DMap is a research prototype and demo tool developed to explore use cases of machine-actionable DMPs. Repository integration is an obvious use case for machine-actionable DMPs. To allow an information exchange between DMPs and RDM systems, a common data model for DMPs is developed by the RDA DMP Common Standards WG (see https://github.com/RDA-DMP-Common/RDA-DMP-Common-Standard).

At my university, prototypes were developed in this direction. I would like to point you to https://hido1994.github.io/madmp/ which shows an integration with Dataverse. Information specified in a DMP (e.g. license, dataset name etc.) can be used to populate metadata fields in the repository system. On the other hand a PID assigned to a dataset by the repository could flow back into the DMP. Another use case for repository integration could be to inform repositories about planned deposits (type, amount, ...) so they can inform researchers about curating the data and select suitable metadata standards, so the transition to the repository at the end of a project becomes smoother. Use cases of maDMPs, collected by the community, are described here https://doi.org/10.3897/rio.3.e13086 and principles to realise the maDMP vision can be found here https://doi.org/10.1371/journal.pcbi.1006750

DMap currently focuses on providing support to researchers by automating the workflow of creating a DMP and exports a DMP in the RDA DMP Common Standard format (JSON). Any RDM service supporting the common data model could help in making systems better integrated.

I'm happy to further discuss Dataverse integration.

Best wishes, Simon

PS. Also, thanks for pointing out the misspelling of Dataverse. Unfortunately I cannot change it because it is controlled vocabulary specified in the re3data schema (http://doi.org/10.2312/re3.007).

pdurbin commented 4 years ago

@oblassers your thoughtful reply was well worth the tiny wait! Thank you.

I first heard about RDA-DMP-Common-Standard from @TomMiksa at https://github.com/IQSS/dataverse/issues/5859#issuecomment-493978167 and passed that information along at https://groups.google.com/d/msg/dataverse-community/JJB33tqykrI/LJUvgnevAgAJ which in my mind is the main thread on the Dataverse mailing list where people are talking about DMP.

From a quick look at https://hido1994.github.io/madmp/ I have a few observations:

It's cool that you're using Dataverse workflows and I bet @qqmyers @pameyer and others will find this interesting.
@poikilotherm will be all over "It also would be possible to run dataverse containerized in docker." You might want to take a look at https://github.com/IQSS/dataverse-kubernetes/pull/119
It's incredible that you've already created a Dataverse external tool! Please open an issue to https://github.com/IQSS/dataverse/issues so we can add "maDMP Export" to a future version of http://guides.dataverse.org/en/4.17/admin/external-tools.html#inventory-of-external-tools
I LOVE all the diagrams at https://hido1994.github.io/madmp/ including the one below!

workflow

mercecrosas commented 4 years ago

The integration with DMap can be very useful for Dataverse users who need a DMP. Would this mean that Dataverse would have a metadata block to support the DMP machine-actionable metadata?

TomMiksa commented 4 years ago

Hi all,

I am really happy to see that you're interested in our work.

Few clarifications from my side to make sure we're one the same page:

https://hido1994.github.io/madmp/ - @Hido1994 considered here two scenarios: (1) DMP was created first, e.g. with a use of a tool like DMAP, and the information it contains was used to automate the upload of data into Dataverse; (2) Data is already in Dataverse and a researcher needs to update his/her DMP. He/she can export from Dataverse relevant information into the maDMP. Thus, we wanted to show how we can assist researchers at different stages of the research data lifecycle: from planning/proposal phase to reporting/end of project phase.
https://github.com/oblassers/dmap - DMAP is a tool developed by @oblassers based on interactive mock-ups (https://oblassers.github.io/dmap-mockups/) which in turn a result of a consulation within the RDA DMP Common Standards WG and interviews with researchers at the TU Wien. The primary goal of this tool, like explained by Simon, is to reduce the amount of questions asked to researchers and maximise the reuse of information from existing systems, e.g. databases with information projects, publcitions, empolyees, etc

Thus, we have two tools, doing different things, but each of them being an important part of the RDM ecosystem around dataverse.

In my opinion, exchange of information between repositories and maDMPs is one of the key use cases in which we can automate a lot, and thus bring a lot of benefits to both researchers and repository managers. I am happy to explore jointly further ideas and integrations!

Cheers, Tomasz

shlake commented 4 years ago

@pdurbin I've sent my re3Data peeps an email about correcting "Dataverse" in the "softwareNames" vocab

mercecrosas commented 4 years ago

Thanks for the clarification, it's helpful.

For the DMAP tool, would it be useful to be able to deposit and archive the JSON DMP to Dataverse with the dataset?

Mercè Crosas, Ph.D. University Research Data Officer, HUIT | Chief Data Science and Technology Officer, IQSS Harvard University mcrosas@g.harvard.edu | @mercecrosas https://twitter.com/mercecrosas | scholar.harvard.edu/mercecrosas

On Wed, Oct 30, 2019 at 10:24 AM TomMiksa notifications@github.com wrote:

Hi all,

I am really happy to see that you're interested in our work.

Few clarifications from my side to make sure we're one the same page:

-

https://hido1994.github.io/madmp/ https://urldefense.proofpoint.com/v2/url?u=https-3A__hido1994.github.io_madmp_&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=CyUlWu8Wk9m8VzztJGuDYi6XZpsEDt9PinCvTOoVAdE&s=5-3P2UX5zXlyireU-D-0yTzOm4hcDDBPEAZSEjgnHiw&e=

@Hido1994 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Hido1994&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=CyUlWu8Wk9m8VzztJGuDYi6XZpsEDt9PinCvTOoVAdE&s=ZQjqRDsCmV6_JFadj30b9XTbXx1YLOlTo4NmP_st0t4&e= considered here two scenarios: (1) DMP was created first, e.g. with a use of a tool like DMAP, and the information it contains was used to automate the upload of data into Dataverse; (2) Data is already in Dataverse and a researcher needs to update his/her DMP. He/she can export from Dataverse relevant information into the maDMP. Thus, we wanted to show how we can assist researchers at different stages of the research data lifecycle: from planning/proposal phase to reporting/end of project phase.

https://github.com/oblassers/dmap https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_oblassers_dmap&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=CyUlWu8Wk9m8VzztJGuDYi6XZpsEDt9PinCvTOoVAdE&s=qM5lCuT2BkWdTerBltSsTpfpC7Pnw7QdpyRWyICN_vE&e=

DMAP is a tool developed by @oblassers https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_oblassers&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=CyUlWu8Wk9m8VzztJGuDYi6XZpsEDt9PinCvTOoVAdE&s=R2a7_JVd4H0lbiPgDADJLEoHZNsyWvIh3BqyWYIV8Sg&e= based on interactive mock-ups ( https://oblassers.github.io/dmap-mockups/ https://urldefense.proofpoint.com/v2/url?u=https-3A__oblassers.github.io_dmap-2Dmockups_&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=CyUlWu8Wk9m8VzztJGuDYi6XZpsEDt9PinCvTOoVAdE&s=jMVRrrhi6BL6IT2kcF5fWsol_wNS2WURmsavUWfedQ0&e=) which in turn a result of a consulation within the RDA DMP Common Standards WG and interviews with researchers at the TU Wien. The primary goal of this tool, like explained by Simon, is to reduce the amount of questions asked to researchers and maximise the reuse of information from existing systems, e.g. databases with information projects, publcitions, empolyees, etc

Thus, we have two tools, doing different things, but each of them being an important part of the RDM ecosystem around dataverse.

In my opinion, exchange of information between repositories and maDMPs is one of the key use cases in which we can automate a lot, and thus bring a lot of benefits to both researchers and repository managers. I am happy to explore jointly further ideas and integrations!

Cheers, Tomasz

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_oblassers_dmap_issues_1-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAAFFBSFV73TTIHZ2JLV37CTQRGKJPA5CNFSM4JD7AHV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECUL6DY-23issuecomment-2D547929871&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=CyUlWu8Wk9m8VzztJGuDYi6XZpsEDt9PinCvTOoVAdE&s=dJnLNtMZJLHUH4k-837qz_ODhWbCz0pV6hCQl1A4m5U&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAFFBSBPXF2EVVKWABZDA43QRGKJPANCNFSM4JD7AHVQ&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=CyUlWu8Wk9m8VzztJGuDYi6XZpsEDt9PinCvTOoVAdE&s=pMADeKMjb96viWyexvQrcSii9RjX_mpe4F2xYYJXzig&e= .

oblassers commented 4 years ago

@mercecrosas good question! DMap is designed to store DMPs in its own database. If you want to make a DMP citable it could be useful to archive it in a repository and get a PID assigned to it. I think a DMP would not necessarily have to be archived together with the datasets since the DMP contains dataset entities which could point to any location.

As @TomMiksa pointed out and @Hido1994 's application shows it could be useful to have maDMP-friendly metadata in the repository and programmatic access to it which could be used to update a DMP.

TomMiksa commented 4 years ago

The DMP contains information on a dataset like: license, access mode (open/closed/shared), embargo period, etc. This information helps in ingesting the data into Dataverse and constitutes dataset's metadata. For example, a DMP states: "collection o JPEG images will be shared under CC0 license after 1 year embargo at the Harvard dataverse repository". When the files are uplodad to Dataverse, specific metadata fields are set in Dataverse and are presented on a landing page of the dataset: license, embargo, etc. For this reason, I believe there is no need to publish the JSON file together with the dataset.

Another use case to consider is publishing DMPs on their own. For example, a repository of DMPs that point to datasets. Imagine a situation in which a researcher would like to find out which projects used specific dataset within the last 2 years. I know that CDL and DataCite are investigating the idea of assigning DOIs to DMPs.

oblassers commented 4 years ago

Another use case to consider is publishing DMPs on their own. For example, a repository of DMPs that point to datasets. Imagine a situation in which a researcher would like to find out which projects used specific dataset within the last 2 years. I know that CDL and DataCite are investigating the idea of assigning DOIs to DMPs.

I know that @kjgarza is working on this.

pdurbin commented 4 years ago

@TomMiksa I see your paper was mentioned in this blog post from last week: https://researchdataq.org/editorials/the-boilerplate-problem-in-data-management-plans/

TomMiksa commented 4 years ago

Hi everyone, I'm writing to let you know that we're organizing a hackathon in which we're trying out different integrations using maDMPs. Would you be interested in joining as a team? It would be cool to have a team that would work on connecting maDMPs with Dataverse. We already have teams working on similar topics.

Here you can find details on the event, including teams that signed up so far: https://github.com/RDA-DMP-Common/hackathon-2020

Cheers, Tomasz

pdurbin commented 4 years ago

@TomMiksa you just reminded me to reach out to you to see if you (or others here) would like to participate in the session about external tools ( https://projects.iq.harvard.edu/dcm2020/breakout-sessions ) for the upcoming Dataverse conference (June 17-19). I sent you an email with more details. Thanks!

oblassers / dmap

Dataverse integration #1