Open agbeltran opened 2 years ago
I don't believe we need to have such a pid_scheme
attribute for this. A proper use of the respective pid
attribute is sufficient to disambiguate.
At HZB, I use the convention to always add a scheme prefix separated by a colon into the pid
or doi
value. To give an example, the ICAT content for one of our data publications looks like (output trimmed for brevity):
>>> query = Query(client, "DataPublication", conditions={"pid": "= 'DOI:10.5442/ND000006'"}, includes=["fundingReferences.funding", "relatedItems", "users.affiliations"])
>>> client.assertedSearch(query)[0]
(dataPublication){
# …
description = "…"
fundingReferences[] =
(dataPublicationFunding){
# …
funding =
(fundingReference){
# …
awardNumber = "ExNet-0042-Phase-2-3"
funderIdentifier = "Crossref Funder ID:10.13039/501100001656"
funderName = "Helmholtz Association"
}
},
(dataPublicationFunding){
# …
funding =
(fundingReference){
# …
awardNumber = ":unas"
funderName = "Helmholtz Einstein International Berlin Research School in Data Science (HEIBRiDS)"
}
},
(dataPublicationFunding){
# …
funding =
(fundingReference){
# …
awardNumber = "0324247"
funderIdentifier = "Crossref Funder ID:10.13039/501100006360"
funderName = "Federal Ministry for Economic Affairs and Energy"
}
},
pid = "DOI:10.5442/ND000006"
publicationDate = 2021-06-28 00:00:00+02:00
relatedItems[] =
(relatedItem){
# …
fullReference = "Cariou, Romain et al. III–V-on-silicon solar cells reaching 33% photoconversion efficiency in two-terminal configuration. Nat Energy 3, 326–333 (2018). https://doi.org/10.1038/s41560-018-0125-0"
identifier = "DOI:10.1038/s41560-018-0125-0"
relatedItemType = "JournalArticle"
relationType = "Cites"
title = "III–V-on-silicon solar cells reaching 33% photoconversion efficiency in two-terminal configuration"
},
(relatedItem){
# …
fullReference = "Bläsi, Benedikt et al. Photonic structures for III-V//Si multijunction solar cells with efficiency >33%. Proc. SPIE 10688, Photonics for Solar Energy Systems VII, 1068803 (2018). https://doi.org/10.1117/12.2307831"
identifier = "DOI:10.1117/12.2307831"
relatedItemType = "JournalArticle"
relationType = "Cites"
title = "Photonic structures for III-V//Si multijunction solar cells with efficiency >33%"
},
(relatedItem){
# …
fullReference = "Tillmann, Peter et al (2021): Optimizing metal grating back reflectors for III-V-on-silicon multijunction solar cells. Optics Express. https://doi.org/10.1364/OE.426761"
identifier = "DOI:10.1364/OE.426761"
relatedItemType = "JournalArticle"
relationType = "IsSupplementTo"
title = "Optimizing metal grating back reflectors for III-V-on-silicon multijunction solar cells"
},
(relatedItem){
# …
fullReference = "Tillmann, Peter et al (2021): Optimizing metal grating back reflectors for III-V-on-silicon multijunction solar cells. Zenodo. https://doi.org/10.5281/zenodo.5013230"
identifier = "DOI:10.5281/zenodo.5013230"
relatedItemType = "Software"
relationType = "IsReferencedBy"
title = "Optimizing metal grating back reflectors for III-V-on-silicon multijunction solar cells"
},
subject = "multi-junction solar cell; optical simulations; finite element method; light trapping; light management; nanotextures; metal grating"
title = "Optimizing metal grating back reflectors for III-V-on-silicon multijunction solar cells"
users[] =
# …
(dataPublicationUser){
# …
affiliations[] =
(affiliation){
# …
fullReference = "JCMwave GmbH, Bolivarallee 22, 14050 Berlin"
name = "01: JCMwave"
},
(affiliation){
# …
fullReference = "Computational Nano Optics, Zuse Institute Berlin, Takustraße 7, 14195 Berlin"
name = "02: ZIB"
pid = "ROR:02eva5865"
},
contributorType = "Creator"
familyName = "Hammerschmidt"
fullName = "Hammerschmidt, Martin"
givenName = "Martin"
orderKey = "004"
},
(dataPublicationUser){
# …
affiliations[] =
(affiliation){
# …
fullReference = "Optics for Solar Energy, Helmholtz-Zentrum Berlin für Materialien und Energie, Albert-Einstein-Straße 16, 12489 Berlin"
name = "01: HZB"
pid = "ROR:02aj13c28"
},
(affiliation){
# …
fullReference = "Computational Nano Optics, Zuse Institute Berlin, Takustraße 7, 14195 Berlin"
name = "02: ZIB"
pid = "ROR:02eva5865"
},
contributorType = "Creator"
familyName = "Tillmann"
fullName = "Tillmann, Peter"
givenName = "Peter"
orderKey = "001"
},
(dataPublicationUser){
# …
affiliations[] =
(affiliation){
# …
fullReference = "Fraunhofer Institute for Solar Energy Systems ISE, Heidenhofstr. 2, 79110 Freiburg, Germany"
name = "01: Fraunhofer ISE"
pid = "ROR:02kfzvh91"
},
contributorType = "Creator"
familyName = "Bläsi"
fullName = "Bläsi, Benedikt"
givenName = "Benedikt"
orderKey = "002"
},
(dataPublicationUser){
# …
affiliations[] =
(affiliation){
# …
fullReference = "JCMwave GmbH, Bolivarallee 22, 14050 Berlin"
name = "01: JCMwave"
},
(affiliation){
# …
fullReference = "Computational Nano Optics, Zuse Institute Berlin, Takustraße 7, 14195 Berlin"
name = "02: ZIB"
pid = "ROR:02eva5865"
},
contributorType = "Creator"
familyName = "Burger"
fullName = "Burger, Sven"
givenName = "Sven"
orderKey = "003"
},
}
As you can see, I have mutliple different types of PIDs in the data: DOIs, Crossref Funder IDs, and RORs in this case. Note that the Crossref Funder IDs are actually DOIs, but still handled separately.
The script that generates the landing pages has a helper class to deal with that:
class PID:
"""Generalization of a persistent identifier.
"""
SchemeURI = {
"DOI": "https://doi.org/",
"arXiv": "https://arxiv.org/abs/",
"ORCID": "https://orcid.org/",
"ROR": "https://ror.org/",
"Crossref Funder ID": "https://doi.org/",
"PaNET": "http://purl.org/pan-science/PaNET/",
"URL": "",
}
def __init__(self, identifier, scheme=None):
# Unless the scheme is overridden, this code assumes the
# identifier to be scheme and id separated by a colon and that
# the scheme part does not contain a colon.
if scheme:
self._type, self._id = scheme, identifier
else:
self._type, self._id = identifier.split(':', maxsplit=1)
if self._type not in self.SchemeURI:
raise ValueError("%s: unknown identifier type" % identifier)
@property
def identifierType(self):
return self._type
@property
def identifier(self):
return self._id
@property
def schemeURI(self):
return self.SchemeURI[self._type] or None
@property
def url(self):
return self.SchemeURI[self._type] + self._id
This helper is able to deal properly with all different types and cases:
>>> p = PID("Crossref Funder ID:10.13039/501100001656")
>>> p.identifierType
'Crossref Funder ID'
>>> p.identifier
'10.13039/501100001656'
>>> p.schemeURI
'https://doi.org/'
>>> p.url
'https://doi.org/10.13039/501100001656'
>>> p = PID("DOI:10.5442/ND000006")
>>> p.identifierType
'DOI'
>>> p.identifier
'10.5442/ND000006'
>>> p.schemeURI
'https://doi.org/'
>>> p.url
'https://doi.org/10.5442/ND000006'
>>> p = PID("ROR:02eva5865")
>>> p.identifierType
'ROR'
>>> p.identifier
'02eva5865'
>>> p.schemeURI
'https://ror.org/'
>>> p.url
'https://ror.org/02eva5865'
E.g. the snippet for adding relatedIdentifiers
to DataCite XML used for the landing pages looks like:
if self.relatedItems:
relatedIds = etree.SubElement(datacite, "relatedIdentifiers")
for r in self.relatedItems:
pid = PID(r['identifier'])
rId = etree.SubElement(relatedIds, "relatedIdentifier")
rId.set("relatedIdentifierType", pid.identifierType)
rId.set("relationType", r['relationType'])
rId.text = pid.identifier
It works the same for any PID type.
We have now
pid
fields for the relevant entities. Thesepid
s may be from different schemes - for example, for affiliations we may have ROR or ISNI identifiers. If facilities rely on more than one scheme, it would be useful to include a field for the ``pid_scheme``` being used.