Closed kpsherva closed 1 year ago
Issue: https://github.com/zenodo/zenodo-rdm/issues/179
In the linked issue, every record that belongs to LORY communities (~10k records) was analysed. The following fields are being used and are not migrated in https://github.com/zenodo/zenodo-rdm/issues/102 :
journal
meeting
imprint
notes
thesis
provisional_communities
: I am not sure whether this one is needed as a custom field or not.~Note that some fields are actually objects, e.g. journal
has nested fields like journal.title
, journal.pages
etc.
Source: Datacite metadata schema v4.4
Legend
M - mandatory
R - recommended
O - optional
List
# Mandatory fields
1 Identifier (with mandatory type sub-property) M
2 Creator (with optional name identifier and affiliation sub-properties) M
3 Title (with optional type sub-properties) M
4 Publisher M
5 PublicationYear M
10 ResourceType (with mandatory general type description sub-property) M
# Recommended + Optional fields
6 Subject (with scheme sub-property) R
7 Contributor (with type, name identifier, and affiliation sub-properties) R
8 Date (with type sub-property) R
9 Language O
11 AlternateIdentifier (with type sub-property) O
12 RelatedIdentifier (with type and relation type sub-properties) R
13 Size O
14 Format O
15 Version O
16 Rights O
17 Description (with type sub-property) R
18 GeoLocation (with point, box and polygon sub-properties) R
19 FundingReference (with name, identifier, and award related sub-properties) O
20 RelatedItem (with identifier, creator, title, publication year, volume, issue, number, page, publisher, edition, and contributor sub-properties) O
AlternateIdentifier
(optional)GeoLocation
(recommended)RelatedItem
(optional)Datacite field | Invenio field | Comments |
---|---|---|
Identifier | pids.doi.identifier | |
Identifier.IdentifierType | pids.doi.provider | |
Creator | metadata.creators | |
Creator.creatorName | metadata.creators.person_or_org.name | |
Title | metadata.title | |
Publisher | metadata.publisher | |
PublicationYear | metadata.publication_date | They ask solely for the year |
ResourceType | metadata.resource_type | |
ResourceType.resourceTypeGeneral | metadata.resource_type.id | (they ask for "The general type of a resource." from a list of values e.g. "Audio") |
See an example of a record in Datacite API
We are not missing any of the mandatory fields from DataCite. Some fields (e.g. PublicationYear
) differ in definition from what we have.
EDIT: thanks @tmorrell for pointing out that AlternateIdentifiers
and GeoLocation
are already in invenio's metadata.
Couple of comments on the DataCite analysis:
In the serializer we transform the publication date field to just the year, so from the DataCite perspective that field is fully compliant.
AlternateIdentifier is present https://inveniordm.docs.cern.ch/reference/metadata/#alternate-identifiers-0-n
Geolocation is present in the metadata https://inveniordm.docs.cern.ch/reference/metadata/#locations-0-n but not the deposit form. Box and Polygon will hopefully be available in the DataCite serialized output soon https://github.com/inveniosoftware/invenio-rdm-records/pull/1144
Fields are used for software specific records.
source: https://codemeta.github.io/terms/
Software warehouse | Type | Description | Source |
---|---|---|---|
codeRepository | URL | Link to the repository where the un-compiled, human readable code and related code is located (SVN, GitHub, CodePlex, institutional GitLab instance, etc.). | schema.org |
programmingLanguage | ComputerLanguage or Text | The computer programming language. | schema.org |
runtimePlatform | Text | Runtime platform or script interpreter dependencies (Example - Java v1, Python2.3, .Net Framework 3.0). Supersedes runtime. | schema.org |
operatingSystem | Text | Operating systems supported (Windows 7, OSX 10.6, Android 1.6). | schema.org |
developmentStatus | Text | Description of development status, e.g. Active, inactive, suspended. See repostatus.org | codemeta |
codeRepository
programmingLanguage
runtimePlatform
operatingSystem
developmentStatus
To be discussed: these fields are related to software resources. Should they be implement as custom fields?
A note on Datacite RelatedItem
field and journal fields (e.g. journal
).
After speaking with @slint , we realised that RelatedItem
can be used as an extension of RelatedIdentifiers
. Datacite recommends its usage when the related item does not have an identifier. BUT it can be used even when the item does have an identifer.
Therefore, the implementation of fields such as journal
might not need custom fields. Instead, we can extend the field RelatedIdentifeir
and later serialize to RelatedItem
for Datacite. To be further discussed and analysed.
Analysis
Implementation tasks