icatproject / icat.server

The ICAT server offering both SOAP and "RESTlike" interfaces to a metadata catalog.
Other
1 stars 5 forks source link

For discussion - add identifier scheme for pids #299

Open agbeltran opened 1 year ago

agbeltran commented 1 year ago

We have now pid fields for the relevant entities. These pids may be from different schemes - for example, for affiliations we may have ROR or ISNI identifiers. If facilities rely on more than one scheme, it would be useful to include a field for the ``pid_scheme``` being used.

RKrahl commented 1 year ago

I don't believe we need to have such a pid_scheme attribute for this. A proper use of the respective pid attribute is sufficient to disambiguate.

At HZB, I use the convention to always add a scheme prefix separated by a colon into the pid or doi value. To give an example, the ICAT content for one of our data publications looks like (output trimmed for brevity):

>>> query = Query(client, "DataPublication", conditions={"pid": "= 'DOI:10.5442/ND000006'"}, includes=["fundingReferences.funding", "relatedItems", "users.affiliations"])
>>> client.assertedSearch(query)[0]
(dataPublication){
   # …
   description = "…"
   fundingReferences[] = 
      (dataPublicationFunding){
         # …
         funding = 
            (fundingReference){
               # …
               awardNumber = "ExNet-0042-Phase-2-3"
               funderIdentifier = "Crossref Funder ID:10.13039/501100001656"
               funderName = "Helmholtz Association"
            }
      },
      (dataPublicationFunding){
         # …
         funding = 
            (fundingReference){
               # …
               awardNumber = ":unas"
               funderName = "Helmholtz Einstein International Berlin Research School in Data Science (HEIBRiDS)"
            }
      },
      (dataPublicationFunding){
         # …
         funding = 
            (fundingReference){
               # …
               awardNumber = "0324247"
               funderIdentifier = "Crossref Funder ID:10.13039/501100006360"
               funderName = "Federal Ministry for Economic Affairs and Energy"
            }
      },
   pid = "DOI:10.5442/ND000006"
   publicationDate = 2021-06-28 00:00:00+02:00
   relatedItems[] = 
      (relatedItem){
         # …
         fullReference = "Cariou, Romain et al. III–V-on-silicon solar cells reaching 33% photoconversion efficiency in two-terminal configuration. Nat Energy 3, 326–333 (2018). https://doi.org/10.1038/s41560-018-0125-0"
         identifier = "DOI:10.1038/s41560-018-0125-0"
         relatedItemType = "JournalArticle"
         relationType = "Cites"
         title = "III–V-on-silicon solar cells reaching 33% photoconversion efficiency in two-terminal configuration"
      },
      (relatedItem){
         # …
         fullReference = "Bläsi, Benedikt et al. Photonic structures for III-V//Si multijunction solar cells with efficiency >33%. Proc. SPIE 10688, Photonics for Solar Energy Systems VII, 1068803 (2018). https://doi.org/10.1117/12.2307831"
         identifier = "DOI:10.1117/12.2307831"
         relatedItemType = "JournalArticle"
         relationType = "Cites"
         title = "Photonic structures for III-V//Si multijunction solar cells with efficiency >33%"
      },
      (relatedItem){
         # …
         fullReference = "Tillmann, Peter et al (2021): Optimizing metal grating back reflectors for III-V-on-silicon multijunction solar cells. Optics Express. https://doi.org/10.1364/OE.426761"
         identifier = "DOI:10.1364/OE.426761"
         relatedItemType = "JournalArticle"
         relationType = "IsSupplementTo"
         title = "Optimizing metal grating back reflectors for III-V-on-silicon multijunction solar cells"
      },
      (relatedItem){
         # …
         fullReference = "Tillmann, Peter et al (2021): Optimizing metal grating back reflectors for III-V-on-silicon multijunction solar cells. Zenodo. https://doi.org/10.5281/zenodo.5013230"
         identifier = "DOI:10.5281/zenodo.5013230"
         relatedItemType = "Software"
         relationType = "IsReferencedBy"
         title = "Optimizing metal grating back reflectors for III-V-on-silicon multijunction solar cells"
      },
   subject = "multi-junction solar cell; optical simulations; finite element method; light trapping; light management; nanotextures; metal grating"
   title = "Optimizing metal grating back reflectors for III-V-on-silicon multijunction solar cells"
   users[] = 
      # …
       (dataPublicationUser){
         # …
         affiliations[] = 
            (affiliation){
               # …
               fullReference = "JCMwave GmbH, Bolivarallee 22, 14050 Berlin"
               name = "01: JCMwave"
            },
            (affiliation){
               # …
               fullReference = "Computational Nano Optics, Zuse Institute Berlin, Takustraße 7, 14195 Berlin"
               name = "02: ZIB"
               pid = "ROR:02eva5865"
            },
         contributorType = "Creator"
         familyName = "Hammerschmidt"
         fullName = "Hammerschmidt, Martin"
         givenName = "Martin"
         orderKey = "004"
      },
      (dataPublicationUser){
         # …
         affiliations[] = 
            (affiliation){
               # …
               fullReference = "Optics for Solar Energy, Helmholtz-Zentrum Berlin für Materialien und Energie, Albert-Einstein-Straße 16, 12489 Berlin"
               name = "01: HZB"
               pid = "ROR:02aj13c28"
            },
            (affiliation){
               # …
               fullReference = "Computational Nano Optics, Zuse Institute Berlin, Takustraße 7, 14195 Berlin"
               name = "02: ZIB"
               pid = "ROR:02eva5865"
            },
         contributorType = "Creator"
         familyName = "Tillmann"
         fullName = "Tillmann, Peter"
         givenName = "Peter"
         orderKey = "001"
      },
      (dataPublicationUser){
         # …
         affiliations[] = 
            (affiliation){
               # …
               fullReference = "Fraunhofer Institute for Solar Energy Systems ISE, Heidenhofstr. 2, 79110 Freiburg, Germany"
               name = "01: Fraunhofer ISE"
               pid = "ROR:02kfzvh91"
            },
         contributorType = "Creator"
         familyName = "Bläsi"
         fullName = "Bläsi, Benedikt"
         givenName = "Benedikt"
         orderKey = "002"
      },
      (dataPublicationUser){
         # …
         affiliations[] = 
            (affiliation){
               # …
               fullReference = "JCMwave GmbH, Bolivarallee 22, 14050 Berlin"
               name = "01: JCMwave"
            },
            (affiliation){
               # …
               fullReference = "Computational Nano Optics, Zuse Institute Berlin, Takustraße 7, 14195 Berlin"
               name = "02: ZIB"
               pid = "ROR:02eva5865"
            },
         contributorType = "Creator"
         familyName = "Burger"
         fullName = "Burger, Sven"
         givenName = "Sven"
         orderKey = "003"
      },
 }

As you can see, I have mutliple different types of PIDs in the data: DOIs, Crossref Funder IDs, and RORs in this case. Note that the Crossref Funder IDs are actually DOIs, but still handled separately.

The script that generates the landing pages has a helper class to deal with that:

class PID:
    """Generalization of a persistent identifier.
    """

    SchemeURI = {
        "DOI": "https://doi.org/",
        "arXiv": "https://arxiv.org/abs/",
        "ORCID": "https://orcid.org/",
        "ROR": "https://ror.org/",
        "Crossref Funder ID": "https://doi.org/",
        "PaNET": "http://purl.org/pan-science/PaNET/",
        "URL": "",
    }

    def __init__(self, identifier, scheme=None):
        # Unless the scheme is overridden, this code assumes the
        # identifier to be scheme and id separated by a colon and that
        # the scheme part does not contain a colon.
        if scheme:
            self._type, self._id = scheme, identifier
        else:
            self._type, self._id = identifier.split(':', maxsplit=1)
        if self._type not in self.SchemeURI:
            raise ValueError("%s: unknown identifier type" % identifier)

    @property
    def identifierType(self):
        return self._type

    @property
    def identifier(self):
        return self._id

    @property
    def schemeURI(self):
        return self.SchemeURI[self._type] or None

    @property
    def url(self):
        return self.SchemeURI[self._type] + self._id

This helper is able to deal properly with all different types and cases:

>>> p = PID("Crossref Funder ID:10.13039/501100001656")
>>> p.identifierType
'Crossref Funder ID'
>>> p.identifier
'10.13039/501100001656'
>>> p.schemeURI
'https://doi.org/'
>>> p.url
'https://doi.org/10.13039/501100001656'
>>> p = PID("DOI:10.5442/ND000006")
>>> p.identifierType
'DOI'
>>> p.identifier
'10.5442/ND000006'
>>> p.schemeURI
'https://doi.org/'
>>> p.url
'https://doi.org/10.5442/ND000006'
>>> p = PID("ROR:02eva5865")
>>> p.identifierType
'ROR'
>>> p.identifier
'02eva5865'
>>> p.schemeURI
'https://ror.org/'
>>> p.url
'https://ror.org/02eva5865'

E.g. the snippet for adding relatedIdentifiers to DataCite XML used for the landing pages looks like:

if self.relatedItems:
    relatedIds = etree.SubElement(datacite, "relatedIdentifiers")
    for r in self.relatedItems:
        pid = PID(r['identifier'])
        rId = etree.SubElement(relatedIds, "relatedIdentifier")
        rId.set("relatedIdentifierType", pid.identifierType)
        rId.set("relationType", r['relationType'])
        rId.text = pid.identifier

It works the same for any PID type.