sul-dlss / happy-heron

Self-Deposit for the Stanford Digital Repository (SDR): H2 is a Rails web application enabling users to deposit scholarly content into SDR
Apache License 2.0
10 stars 2 forks source link

druid:fy040rv4004 missing metadata needed for DOI update #3375

Closed hannahfrost closed 1 year ago

hannahfrost commented 1 year ago

The author name (Rosen, Gil) was inadvertently deleted from this submission before the collection manager approved it for deposit. Now the work is stuck in H2 with this error message showing in Argo:

"Error: update-doi : Conflict (Item requested a DOI be updated, but it doesn't meet all the preconditions. Datacite requires that this object have creators and a datacite extension with resourceTypeGeneral)"

https://argo.stanford.edu/view/druid:fy040rv4004

hannahfrost commented 1 year ago

The H2 work number is 7094

peetucket commented 1 year ago

Step errored out:

Screen Shot 2023-09-08 at 3 36 15 PM

I just set that step to complete manually in argo and re-indexed the item, and it now has a deposited state in H2.

Separately we need to sort out why that failed, what we can possibly do about it, and then trigger that step's action again manually. If the investigation reveals there is a metadata/mapping mismatch issue (i.e. datacite's requirements are stricter than ours) we may just need to skip in some cases without erroring.

peetucket commented 1 year ago

Code path:

  1. Common-accessioning robot update-doi makes a dor-services-client gem call.
  2. Dor-services-client calls dor-services-app update_doi_metadata endpoint
  3. This endpoint checks if the object can be mapped and discovers it cannot, throwing the error, which makes it's way back up to the robot and puts it in an error state.

We need two things to map the metadata for datacite:

  1. resource type general listed
  2. at least one "creator" contributor

This object satisfies 1 but not 2.

On DSA rails console:

@cocina_object = CocinaObjectStore.find('druid:fy040rv4004')
attributes = Cocina::ToDatacite::Attributes.new(@cocina_object)
 attributes.exportable?
==> false

attributes.send(:types_attributes)&.fetch(:resourceTypeGeneral).present?  
=> true # GOOD!

attributes.send(:creators).present?
 => false # BAD!

# Look at the all the contributors
attributes.send(:creator_contributor_funder_attributes)
 =>
{:creators=>[],
 :contributors=>
  [{:name=>"Polyakova, Maria", :givenName=>"Maria", :familyName=>"Polyakova", :nameType=>"Personal", :contributorType=>"Other"},
   {:name=>"Rosston, Greg", :givenName=>"Greg", :familyName=>"Rosston", :nameType=>"Personal", :contributorType=>"Other"},
   {:name=>"Stanford University", :nameType=>"Organizational", :contributorType=>"Other"},
   {:name=>"Public Policy", :nameType=>"Organizational", :contributorType=>"Other"}],
 :fundingReferences=>[]}

If you dig in to the class that fetches the creators above: https://github.com/sul-dlss/dor-services-app/blob/main/app/services/cocina/to_datacite/creator_contributor_funder.rb it looks like maybe the logic there is either not correctly dealing with the contributor types we have for this H2 record or we need at least one "author" or "creator" role type? The two person contributors both have a role of "advisor" in the Cocina for this object (and also as seen in H2): https://sdr.stanford.edu/works/7094

peetucket commented 1 year ago

Something in this class is deciding we have no creators in the metadata. Trying to understand the Escher like logic:

contrib = Cocina::ToDatacite::CreatorContributorFunder.new(@cocina_object.description)
contrib.send(:cocina_creators)
 => []
contrib.send(:datacite_creators)
 => []
peetucket commented 1 year ago

Ahh, I think it is because the only "person" type contributors we have in this object are "advisor", which appears to set a note of "citation status" to "false" in cocina. And anything with citation status set to "false" is not counted as a creator for the purposes of mapping to data-cite. So this object is not considered to have any creators, and is thus not mapped to data-cite.

peetucket commented 1 year ago

So we have a few options:

  1. Change how we map to data-cite to add some more logic to the already tangled web to deal with a situation like this (if it really needs to be dealt with).
  2. Force the user to go back and change their contributor types in H2. This adds some wrinkles (the object is now stuck and requires some remediation to allow editing again in H2).
  3. Do nothing with the logic but change how we handle this case so that instead of throwing an error in DSA, it simply alerts HB (as an FYI) and then returns a 200, allowing accessioning to proceed without mapping to datacite. Result: object is not mapped to datacite but does not error.

Unless Arcadia thinks we should change the mapping, to me 3 seems like a better solution than 2.

andrewjbtw commented 1 year ago

Do we validate that the DataCite metadata is complete on the H2 form itself when an object is to be assigned a DOI? It seems like this shouldn't have been submit-able without the required metadata.

peetucket commented 1 year ago

I am fairly confident we do not validate the contributors entered in H2 to ensure there is at least one mappable contributor that has a citation-status of true (e.g. ensure that all entered contributors are not something like "advisor"). But yes, that would be another approach (presuming there isn't a valid reason for a depositor to do this).

amyehodge commented 1 year ago

It looks to me like this item failed because the advisors are listed in the section of contributors not to be included in the citation. And it seems completely right to me that we should not be mapping those to DataCite.

However, I don't think we can require an entry in the "authors included in the citation" section all the time, largely due to staff workflows. But considering the staff use cases without an author will never (or almost never) overlap with the need for a DOI, I do think it's appropriate that we require an "author included in the citation" every time we issue a DOI.

Because Hannah added the author back to this particular deposit, it now has a functional DOI, so there is nothing further that needs to be done for this item.

I am going to close this ticket and open another one to request that we 1) require an author when creating a DOI, 2) notify an admin that this has happened so we can check with the depositor about whether they want to make a change so they can get a DOI, and 3) make sure that not creating a DOI when a DOI has been requested doesn't break the UI somehow (e.g. will not being able to display a DOI in that field on the item view page when it thinks there should be one keep the page from loading? we might want to sub in another message like "DOI could not be generated" or something in that case).

peetucket commented 1 year ago

Note: i did verify this object has a DOI. Was it created in the v1 before this error originally occurred.

amyehodge commented 1 year ago

This object was deposited initially on Sep 7 and updated to add the missing author on Sep. 12:

2 (2.0.0) adding author which was missing

  | Opened | 2023-09-12 00:37:51 UTC   | Submitted | 2023-09-12 00:37:51 UTC   | Described | 2023-09-12 00:37:51 UTC   | Published | 2023-09-12 00:37:53 UTC   | Deposited | 2023-09-12 00:37:58 UTC   | Accessioned | 2023-09-12 00:37:59 UTC

The DOI was created at the same time the author was added: September 12, 2023 at 24:37:54 UTC.