pulibrary / pdc_describe

Description application for Research Data content
7 stars 1 forks source link

fixes to CKavity migration spec #592

Closed Twade968 closed 1 year ago

Twade968 commented 1 year ago

This is the original record that we are migrating: https://dataspace.princeton.edu/handle/88435/dsp015999n626m?mode=full

Acceptance Criteria

Notes from Matt

there are several issues here. Let me preface everything by saying we (PRDS) did not curate this one, and I'm not sure which submission form was used for it.

I'm looking at the full item record to understand it better: https://dataspace.princeton.edu/handle/88435/dsp015999n626m?mode=full

The dc.contributor.author field was not used. Instead, there are three dc.creator fields in the original record that should be translated to the DataCite Creator field. One of the names listed as a creator was repeated as a contributor, and we do not want to replicate that error. In addition, we typically do not input funding agencies as contributors, so the Creator field you have for the National Science Foundation should be omitted.

The original record has a title as well as an alternative title, and the title field already includes a subtitle, so I do not think that the two title fields should be merged in DataCite. Instead, the alternative title from the original record can be translated as a second title with the type "AlternativeTitle" (see https://support.datacite.org/docs/datacite-metadata-schema-v44-mandatory-properties#3-title).

For some reason, this record does not have an issue date (which is generally required). However, I can see that the dc.date.accessioned is in 2019, so the DataCite record should have the Publication Year field set to 2019 instead of 2020.

While the original record does have the publisher set to "Princeton University Lewis-Sigler Institute", our current practice is to correct such entries to "Princeton University".

The original record has two description fields filled: abstract and table of contents. My understanding is that for the purposes of migration, we are copying over abstracts as Description Type "Other" and omitting other description fields for now.

carolyncole commented 1 year ago

refs #409

Twade968 commented 1 year ago
<?xml version="1.0"?>
<resource xsi:schemaLocation='http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns='http://datacite.org/schema/kernel-4'>
  <identifier identifierType='DOI'>10.34770/gg40-tc15</identifier>
  <creators>
    <creator>
      <creatorName>Leach, Robert</creatorName>
      <givenName>Robert</givenName>
      <familyName>Leach</familyName>
    </creator>
    <creator>
      <creatorName>Hecht, Michael</creatorName>
      <givenName>Michael</givenName>
      <familyName>Hecht</familyName>
    </creator>
    <creator>
      <creatorName>Karas, Christina</creatorName>
      <givenName>Christina</givenName>
      <familyName>Karas</familyName>
    </creator>
  </creators>
  <titles>
    <title>CKavity Library: Next-Generation Sequencing</title>
    <title titleType='AlternativeTitle'>
      A library of novel genes with combinatorially diverse cavities, built on a
      stably folded structural template
    </title>
  </titles>
  <publisher>Princeton University</publisher>
  <resourceType resourceTypeGeneral='Dataset'/>
  <publicationYear>2019</publicationYear>
  <relatedIdentifiers>
    <relatedIdentifier relationType='IsIdenticalTo' relatedIdentifierType='ARK'>ark:/88435/dsp015999n626m</relatedIdentifier>
  </relatedIdentifiers>
  <version>1</version>
  <rightsList>
    <rights rightsURI='https://creativecommons.org/licenses/by/4.0/' rightsIdentifier='CC BY'>Creative Commons Attribution 4.0 International</rights>
  </rightsList>
  <descriptions>
    <description descriptionType='Other'>
      Protein sequence space is vast; nature uses only an infinitesimal fraction
      of possible sequences to sustain life. Are there solutions to biological
      problems other than those provided by nature? Can we create artificial
      proteins that sustain life? To investigate this question, the Hecht lab
      has created combinatorial collections, or libraries, of novel sequences
      with no homology to those found in living organisms. These libraries were
      subjected to screens and selections, leading to the identification of
      sequences with roles in catalysis, modulating gene regulation, and metal
      homeostasis. However, the resulting functional proteins formed dynamic
      rather than well-ordered structures. This impeded structural
      characterization and made it difficult to ascertain a mechanism of action.
      To address this, Christina Karas&apos;s thesis work focuses on developing
      a new model of libraries based on the de novo protein S-824, a four-helix
      bundle with a very stable three-dimensional structure. The first part of
      this research focused on mutagenesis of S-824 and characterization of the
      resulting proteins, revealing that this scaffold tolerates amino acid
      substitutions, including buried polar residues and the removal of
      hydrophobic side chains to create a putative cavity. Distinct from
      previous libraries, Karas targeted variability to a specific region of the
      protein, seeking to create a cavity and potential active site. The second
      part of this work details the design and creation of a library encoding
      1.7 x 10^6 unique proteins, assembled from degenerate oligonucleotides.
      The third and fourth parts of this work cover the screening effort for a
      range of activities, both in vitro and in vivo. I found that this
      collection binds heme readily, leading to abundant peroxidase activity.
      Hits for lipase and phosphatase activity were also detected. This work
      details the development of a new strategy for creating de novo sequences
      geared toward function rather than structure.
    </description>
  </descriptions>
</resource>
Twade968 commented 1 year ago

@matthewjchandler Would you please review the attached Datacite record and let me know if it looks ok and if I have addressed all of your concerns. I see that there is some weirdness around how apostrophes are being recorded. I have already a ticket to address that here: https://app.zenhub.com/workspaces/rdss-workcycles-61a4f1a12a399b001730f65a/issues/pulibrary/pdc_describe/601

matthewjchandler commented 1 year ago

This record does not have a DOI (that PRDS has minted), and the DOI given in the XML above is for a different record (http://arks.princeton.edu/ark:/88435/dsp01rj4307478). Otherwise, everything else looks as I would expect. Thanks @Twade968 !

Twade968 commented 1 year ago

We need a design decision about how we will handle migrating works that do not have a doi.