hubzero / hubzero-cms

Platform for Scientific Collaboration
https://hubzero.org
GNU General Public License v2.0
47 stars 57 forks source link

[PURR][#2513]Changes to setting of DOI metadata elements #1697

Closed kuang5 closed 4 months ago

kuang5 commented 7 months ago

Support ticket: https://purr.purdue.edu/support/ticket/2513 Summary of Issue: We see that a few metadata elements' values in the Datacite DOI metadata set are not set properly and also want to include citation in DOI metadata set. Summary of Fix/Changes:

  1. Change the attribute value of "type" from "Valid" to "Available" for the date element that maps to the "published_up" column in table #__publication_versions.
  2. Remove the attributes: 'xml:lang="en-US" schemeURI="https://spdx.org/licenses/" rightsIdentifierScheme="SPDX" rightsIdentifier="CC0 1.0"' from the rights element.
  3. Fix issue that relatedIdentifer element with relationType "IsNewVersionOf" is set to wrong doi value, where the DOI of the previous version dataset is the correct. Add relatedIdentifer element with relationType "IsPreviousVersionOf", where the value is set to the DOI of the next version dataset.
  4. Set citations to relatedIdentifer element. Summary of Testing:
  5. Create, submit and approve three file publications, where their versions are in ascending order, such as version 1, version 2 and version 3.
  6. Login on doi.datacite.org and check the metadata set for the three DOIs of the datasets. (1) All three DOI metadata sets should include date element and rights element with correct attribute and value as specified in Summary of Fix/Changes. (2) The DOI metadata set of the second version publication and third verion publication should have relatedIdentifier with relationType "IsNewVersionOf", where the value of the relatedIdentifier element is the doi of the previous version publication.
  7. Open the first version publication and second version publication separately in publication component on admin interface, then click the save button on top right, then go back to check metadata set on DataCite, where you should see that they all include relatedIdentifier with relationType "IsPreviousVersionOf", where the value of the element is the doi of the next version publication.
  8. Citation test. Follow the steps below to test that citation is included in DOI metadata record on DataCite. (1) Login on admin interface and click the menu components->citation to open the citation component admin interface. (2) Click on the “+” icon on upper right to open the citation editing form, and enter citation information in each field, such as choose the Type, set the Title/Chapter, and set at least either DOI or other identifier URL of the citation. To test different identfier that is associated with the citation, try each way below in a single test. a. Set DOI of the citation in DOI field. b. Leave the DOI field empty, and set URL field to "https://purl.utwente.nl/essays/96565". c. Leave the DOI field empty, and set URL field to "https://scholarworks.montana.edu/xmlui/handle/1/9567". d. Leave the DOI field empty, and set URL field to "https://n2t.net/ark:/12148/btv1b8449691v/f29", and set E-print field to "ark/1237772". e. Leave the DOI field empty, and set URL field to "https://arxiv.org/abs/1210.5802", and set E-print field to "1210.5802v2". f. Leave the DOI field empty, and set URL field to "http://urn.fi/URN:NBN:fi:hulib-202112144262", and set E-print field to "URN:NBN:fi:hulib-202112144262". g. Leave the DOI field empty, and set URL field to "https://url.utwente.nl/essays/96565". (3) In the “CITATION FOR” section, set the type to “Publication”, enter the dataset’s publication version ID, and choose the context option. (5) Click “Save and Close” button on upper right, the citation will be pushed to the DOI’s metadata record on DataCite when it shows "citation successfully saved". Login on doi.datacite.org, open the corresponding doi metadata record and check whether the expected relatedIdentifier element with the proper relatedIdentifierType is set. Hotfix needed? No.
dbenham commented 6 months ago

@kuang5 Code looks good, but we wanted to review the changes in a bit more detail. I'll get in touch and this meeting up.

monaw commented 6 months ago

@kuang5 - couple of questions:

  1. which version the DataCite schema is this code for?
  2. is it possible to document the crosswalk you've implemented between Hubzero's metadata fields and the DataCite fields you are setting? also include the DataCite fields that you are hard-coding?
kuang5 commented 6 months ago
  1. The changes are for DataCite schema version 4.4.
  2. I don't know what Hubzero's metadata fields are set and where they are used besides the publication component. The change in this pull request doesn't affect the Hubzero metadata fields that are set and used. The attribute value and element value of DataCite fields in this pull request are not hard-coded, but the actual information that either come from PURR database or we collect from scholar publications that cites the PURR dataset.
monaw commented 6 months ago

thanks @kuang5 !

  1. it would be useful to add the DataCite schema version to the code & doc somewhere. as you probably know, they keep updating it and 4.5 came out.
  2. thanks for confirming there are no hard-coded values being set. a crosswalk for the fields being mapped from the publication component/database/dataset to DataCite would be helpful for documentation purposes. similar to https://codemeta.github.io/crosswalk/datacite/

we would like to improve documentation for Hubzero code and appreciate your help toward this goal (:

kuang5 commented 5 months ago

@monaw

  1. To reflect the DataCite schema version, I have changed "https://schema.datacite.org/meta/kernel-4/metadata.xsd" in the xsi:schemaLocation of XML header to "https://schema.datacite.org/meta/kernel-4.5/metadata.xsd".
  2. I attached an excel spreadsheet that includes the metadata fields and corresponding source of values in database. The metadata fields are involved in the code changes of this pull request. DataCite_metadata_field_and_value.xlsx
monaw commented 4 months ago

Hi @kuang5 , thanks for the spread sheet! I would like to confirm whether strings like "jos_publication_versions::published_up" in the value field, "jos_citation_association::type" for relationType, and "jos_citation_types::type_desc" for resourceTypeGeneral are to be used literally or will they be replaced with the actual value from variables? Because it seems from documentation that they are the literal strings. If that's the case then I think that is incorrect according to the DataCite 4.5 Schema PDF:

  1. some relationType values (eg _relationType="jos_citationassociation::type") are not in DataCite Schema's list of controlled values (see pages 35-37 of the above DataCite Schema document)
  2. also some resourceTypeGeneral values (eg _resourceTypeGeneral="jos_citation_types::typedesc") which are not in the list of controlled values (see pages 70 and 71)
  3. lastly, this means that relatedIdentifer values (eg jos_citations::doi) are also not valid (ie it should be the actual DOI value, not the HZ table::field)

I'm not sure how DataCite will handle values not in their controlled list. Did you get any error during your testing?