sul-dlss / happy-heron

Self-Deposit for the Stanford Digital Repository (SDR): H2 is a Rails web application enabling users to deposit scholarly content into SDR
Apache License 2.0
10 stars 2 forks source link

Map H2 works to ORCID works for sending deposit metadata to ORCID #3103

Closed peetucket closed 1 year ago

peetucket commented 1 year ago

For background, see #3160

ORCID requirements: Indicates we should only any H2 works (even without a DOI). The PURL can act as an identifier. "Works added by trusted organizations using the ORCID API must include an identifier. Works can have multiple identifiers, and an identifier must be included for duplicate versions of the work to group on the ORCID record. If you start entering an identifier by choosing the identifier type, then the identifier value and relationship are required."

ORCID Data model information:

H2 cocina generators: To see how we get from the database to cocina

H2 database schema: To see what we store (in particular, focus on the "works", "work_versions", "abstract_contributors"

ORCID API information:

amyehodge commented 1 year ago

@peetucket

Questions:

  • what happens if the H2 work changes? Do we push again?

Ideally, but some users modify the works in quick succession, so see answer to question below.

  • when do we push? as a background job after deposit?

To avoid having jobs collide, maybe we should push on a periodic schedule -- even weekly could be often enough? Or I would think nightly would do. I think trying to push with deposit every time might cause problems and I don't see this as really urgent.

justinlittman commented 1 year ago
amyehodge commented 1 year ago

@justinlittman Pushing with every deposit is permissible if that's what you decide. For me, weekly would be often enough, but sooner works too.

amyehodge commented 1 year ago

@justinlittman Regarding request/approval, we can only push to ORCID records for individuals who have already given us approval to do so. ORCID users can choose whether to make any individual work publicly visible or not on their ORCID record, and can set up notifications so that they get an email from ORCID every time something is added to their record.

Screenshot 2023-06-22 at 11 01 34 AM

Given that the point of the trusted user authorization is so that organizations like Stanford can push things to their record, we consider doing so to be a bonus for users. And if they are concerned, they can set up the necessary notifications to intercept and hide additions as soon as they are made.

justinlittman commented 1 year ago

Have we asked any users if this is a desired feature and/or if they would expect or be surprised by H2 items being added to Orcid record? (I understand that they have consented to it; did they understand what they consented to?)

peetucket commented 1 year ago
* Can the mapping from H2 work metadata to an ORCID work metadata be specified? What should be used as the required identifiers?

Can I recommend that the developer that picks up this ticket takes a first pass at going through the available metadata from H2 and use the ORCID data model to map the obvious fields. For fields that we have in H2 with no obvious counterpart is in ORCID, we then ask for assistance from either ORCID and/or our metadata department? I think this was the approach we took in sul-pub.

* Are we really sure that users want H2 content to be added to their Orcid profiles without request/approval? (For example, if the H2 item are files supporting a different publication or H2 item is just another copy of a different publication. Concrete example: If I deposited a copy of my recent C4L Journal article in the Staff collection, I wouldn't want it added to my Orcid profile, since the original publication is added to my Orcid profile).

For the duplicate question, I believe ORCID allows us to only push works with a specific list of known identifiers (e.g. DOIs) so that they can be de-duped on the ORCID side (as is done with sul-pub).

amyehodge commented 1 year ago

@justinlittman I think there is still a plan to do some user interviews. We can look at adding a question to this effect to those interviews. Though I think for this question it would be best to find some H2 users who have already authorized Stanford to push to ORCID.

How hard is it to get the intersection of SUNet IDs that 1) exist in the H2 database as any kind of user and 2) have authorized Stanford to write to ORCID? These are the people we want to ask. Others won't know what we're talking about.

peetucket commented 1 year ago

@justinlittman I think there is still a plan to do some user interviews. We can look at adding a question to this effect to those interviews. Though I think for this question it would be best to find some H2 users who have already authorized Stanford to push to ORCID.

How hard is it to get the intersection of SUNet IDs that 1) exist in the H2 database as any kind of user and 2) have authorized Stanford to write to ORCID? These are the people we want to ask. Others won't know what we're talking about.

We should be able to get that list easily, let me ticket it. Part of the challenge here honestly is that I suspect not many researchers are really asking for this feature nor even care that much about it. But if/when it exists, it will provide additional value in the future (by further promoting their research)...assuming ORCID continues to take off and be relevant, as we all hope. To me it feels a bit like the apocryphal Henry Ford "quote" about if you asked people what they wanted, they would have said a faster horse. In some sense, we are building a speculative car here, and hoping we are correct.

peetucket commented 1 year ago

See #3117 for results of my analysis of ORCIDs currently in H2.

amyehodge commented 1 year ago

For the record, results of mini-survey of users on pushing works to ORCID is at https://docs.google.com/spreadsheets/d/1csEtMDALUTW86avBisbQln-ypQeP12zi2vst4vVTW1Q/edit. They are overwhelmingly in favor of this and they don't want to be asked or notified.

ndushay commented 1 year ago

do we need Arcadia to sign off on this mapping?

peetucket commented 1 year ago

do we need Arcadia to sign off on this mapping?

Yes, for sure. We are awaiting her assistance for this (indicated with the "metadata" tab label).

arcadiafalcone commented 1 year ago

H2 to ORCID metadata mapping - vocabulary mappings still in progress

<?xml version="1.0" encoding="UTF-8"?>
<work:work xmlns:common="http://www.orcid.org/ns/common" xmlns:work="http://www.orcid.org/ns/work"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.orcid.org/ns/work/work-3.0.xsd ">
    <work:title>
        <common:title>H2 title</common:title>
    </work:title>
    <work:short-description>H2 abstract</work:short-description>
    <work:citation>
        <work:citation-type>formatted-unspecified</work:citation-type>
        <work:citation-value>H2 preferred citation</work:citation-value>
    </work:citation>
    <work:type>See H2 work type mapping</work:type>
    <!-- use deposit date if publication date not present -->
    <common:publication-date>
        <common:year>H2 publication (or deposit) year</common:year>
        <common:month>H2 publication (or deposit) month</common:month>
        <common:day>H2 publication (or deposit) day</common:day>
    </common:publication-date>
    <common:external-ids>
        <common:external-id>
            <common:external-id-type>doi</common:external-id-type>
            <common:external-id-value>DOI value</common:external-id-value>
            <common:external-id-url>DOI URL</common:external-id-url>
            <common:external-id-relationship>self</common:external-id-relationship>
        </common:external-id>
        <common:external-id>
            <common:external-id-type>uri</common:external-id-type>
            <common:external-id-value>Stanford purl value</common:external-id-value>
            <common:external-id-url>Stanford purl URL</common:external-id-url>
            <common:external-id-relationship>self</common:external-id-relationship>
        </common:external-id>
    </common:external-ids>
    <common:url>Stanford purl</common:url>
    <work:contributors>
        <work:contributor>
            <common:contributor-orcid>
                <common:uri>ORCID or ROR URL</common:uri>
                <common:path>ORCID or ROR identifier</common:path>
                <common:host>orcid.org or ror.org</common:host>
            </common:contributor-orcid>
            <work:credit-name>H2 first name + space + H2 last name</work:credit-name>
            <work:contributor-attributes>
                <work:contributor-role>See H2 role mapping</work:contributor-role>
            </work:contributor-attributes>
        </work:contributor>
    </work:contributors>
    <common:language-code>en</common:language-code>
    <common:country>US</common:country>
</work:work>
arcadiafalcone commented 1 year ago

For H2 work type and role mappings, see https://docs.google.com/spreadsheets/d/1zrRCFgwWkPW0Vg9mIsUe-rMZkiCvudB2t3VLw0dn-rY/edit?usp=sharing.

For <work:credit-name>, include only names in the H2 cited author section. Omit names not in the citation.

For <work:contributor-role>, enter a value only if there is a mapping, otherwise omit this stanza.

arcadiafalcone commented 1 year ago

The following values are literals, i.e. not mapped based on H2 content.

<work:citation-type>formatted-unspecified</work:citation-type>

<common:external-id-type>doi</common:external-id-type>

<common:external-id-type>uri</common:external-id-type>

<common:external-id-relationship>self</common:external-id-relationship>

<common:language-code>en</common:language-code>

<common:country>US</common:country>