Closed RKrahl closed 1 year ago
I notice that the CI is failing. I didn't have time to investigate that so could you take a look.
The RelatedItem
in DataCite 4.4 is an extended and generalized version of RelatedIdentifier
. It is extended in that it has additional subproperties, such as relatedItemType
, Creator
, Title
, PublicationYear
, … The purpose of these properties is mostly to allow generating a citation line referencing the related resource. They could in principle be fetched from the metadata of the related identifier, at least if that identifier is a DataCite DOI. It is generalized in the sense that the relatedItemIdentifier
is optional, so it is also possible to reference items that do not have an identifier.
To give a concrete example: if a dataset is published with a DataCite DOI and the data has been collected from an instrument, that instrument could be referenced in the DataCite metadata using RelatedIdentifier
as:
<relatedIdentifier relatedIdentifierType="DOI" relationType="IsCompiledBy">10.5442/NI000001</relatedIdentifier>
or using RelatedItem
as:
<relatedItem relatedItemType="Other" relationType="IsCompiledBy">
<relatedItemIdentifier relatedItemIdentifierType="DOI">10.5442/NI000001</relatedItemIdentifier>
<titles>
<title>E2 - Flat-Cone Diffractometer</title>
</titles>
</relatedItem>
or using both redundantly. The RelatedItem
provides additional information, the type of the related resource[^1] and the name of the instrument, included as title
. The same information could be fetched from the referenced instrument DOI metadata. Note that, as one can see in this example, RelatedItem
has a subproperty relatedItemIdentifier
, so it's not only for the case of a related resource having no identifier.
Adding the additional information directly in the DataCite metadata may be needed, for instance if the metadata should be harvested by B2FIND. B2FIND needs to map the incoming metadata onto the EUDAT Core Metadata Schema. EUDAT Core has a text property for Instrument
, so it would be possible to map the title
from the RelatedItem
onto that.[^2] But (as I learned last week) the mapper from B2FIND does not support resolving external identifiers and fetching metadata from the related DOIs. So if the instrument would only be linked using RelatedIdentifier
, it would not be possible to have the Instrument
property set in B2FIND.
In order to be able to use RelatedItem
, we would need at least relatedItemType
and title
, as proposed in this PR, because these subproperties are mandatory.
[^1]: That type is Other
here, because there is no Instrument
in the controlled list of terms for resource type in the current DataCite version. But hopefully it will be added in the future.
[^2]: The mapper does support individual per repository configuration. So it would be possible to add a rule: if a dataset relates a resource with IsCompiledBy
, then that resource should be taken as an instrument.
… continued:
So a RelatedItem does not have an identifier, but we have
identifier
as a mandatory field in RelatedIdentifier. Also, in RelatedItemtitle
is a mandatory field but in RelatedIdentifier title is optional.
As explained above, in DataCite, RelatedItem
may have an identifier, but as opposed to RelatedIdentifier
, it is not mandatory. And I hope, I made clear that there are good reasons to use RelatedItem
also for resources that have an identifier.
I'm not so sure if there is any use case for linking to resources not having an identifier in the context of a data publication from a PaN facility. So I don't believe, the fact that identifier
is mandatory in RelatedIdentifier
in the ICAT schema would be an issue in practice.
Which makes me feel that we really should have an additional entity for RelatedItem with the mandatory and optional fields set correctly rather than trying to use RelatedIdentifier for both purposes.
I believe, having separated classes in the ICAT schema for both cases would make things overly complicated. In practice, one might want to add both properties for the same related resource in the DataCite metadata: RelatedItem
, because it provides the additional subproperties needed for instance for B2FIND and RelatedIdentifier
for backward compatibility, because RelatedItem
is relatively new and might not be understood by all consumers of the metadata.
In the end, it will be a site specific script that generates the DataCite metadata out of ICAT, either the script that generates the data publication landing pages or the XSLT file that generates the metadata in icat.oaipmh
. The schema as proposed in this PR allows for all options one might want to implement. It is for instance relatively easy to code into the XSLT something like:
RelatedIdentifier
entry out of identifier
and relationType
,RelatedItem
entry if relatedItemType
and title
are not empty.Or, if you want to avoid the redundancy, you might put into your code:
RelatedIdentifier
entry out of identifier
and relationType
if relatedItemType
or title
is empty,RelatedItem
entry if relatedItemType
and title
are not empty.A site that doesn't care about RelatedItem
may just ignore relatedItemType
and title
.
I notice that the CI is failing. I didn't have time to investigate that so could you take a look.
The CI is always failing for any things I submit. I guess that is a permission issue. I also don't get to see any diagnostic messages, so I can't tell what is going wrong.
As discussed with @kevinphippsstfc today, we decided to rename the entity class RelatedIdentifier
to RelatedItem
with this PR in order to make it clearer what this is supposed to be. For the same reason, I expanded some of the comment strings to provided additional hints what should be put into the attributes.
So the new class now looks like:
RelatedItem
A reference to an external resource or item that is related to a data publication, such as a scientific article that is based on the data or the instrument that has been used to collect the data
Uniqueness constraint: publication
, identifier
Relationships:
Card | Class | Field |
---|---|---|
1,1 | DataPublication | publication |
Other fields:
Field | Type | Description |
---|---|---|
identifier | String [255] NOT NULL | The identifier of the related resource |
relationType | String [255] NOT NULL | Description of the relationship with the related resource, see DataCite property relationType for suggested values |
fullReference | String [1023] | The full reference for the related resource as it should be displayed on the landing page |
relatedItemType | String [255] | The type of the related resource, see DataCite property resourceTypeGeneral for suggested values |
title | String [255] | Title or name of the related resource |
Obviously, the corresponding one-to-many relation in DataPublication
has also been renamed accordingly.
Add some more optional attributes to data publication related classes.
In detail:
DataPublicationUser
: FieldRelatedIdentifier
: FieldClose #295 and close #296.