sul-dlss / common-accessioning

Suite of robots that handle the tasks of accessioning digital objects
Other
2 stars 1 forks source link

goobi-notify step in goobiWF is hitting errors more frequently for no apparent reason #1043

Closed andrewjbtw closed 1 year ago

andrewjbtw commented 1 year ago

Describe the bug When Goobi users register items in Argo, they set the items to use the goobiWF. This workflow takes some of the SDR metadata (druid, tags, etc.) and transmits it to Goobi. This makes it possible to associate SDR druids with files being processed in Goobi.

The last step in this workflow is goobi-notify. This step has been hitting more errors in recent weeks than it did before. The puzzling thing about it is that there's no clear indication that something is wrong.

In the past, when this error occurred there was usually a clear reason for that, such as a missing tag. Goobi expects certain data to come over from SDR and if that data is not received, it reports an error. When that happens, users can add the missing information in Argo and then retry the step.

In the recent errors, there hasn't been missing information. When users have retried the step, it's worked right away with no further changes. We have not been able to determine what is going wrong.

The goobi error message is pretty generic:

Error: goobi-notify : Bad Request: 400 (<?xml version="1.0" encoding="UTF-8" standalone="yes"?><creationResponse><errorText>Process template Example_Workflow does not exist.</errorText><processId>0</processId><processName>sul:M1132_s2_b38_f9</processName><result>error</result></creationResponse>) 

The error text suggests that a template couldn't be found. It might be the case that SDR is sending incomplete information to Goobi the first time and then sending the correct information after a retry.

User Impact It disrupts Goobi users work. They have to dig around in Argo to figure out why their items are not appearing in Goobi. There doesn't seem to be a pattern of when it will fail.

To Reproduce I attempted to reproduce this issue on stage but haven't been successful. I registered 100 items to use Goobi in a relatively big batch on the theory that larger batches might be more likely to generate errors.

This item currently has the error and in the short term it's being left alone to provide an example: https://argo.stanford.edu/view/druid:gt114mr9927

Expected behavior If all required information is provided during registration, goobi-notify should succeed.

peetucket commented 1 year ago

This is the exact error we'd expect if the DPG : Workflow : tag is missing from the object. But that doesn't explain why a retry would fix it. The example object above (https://argo.stanford.edu/view/druid:gt114mr9927) is no longer in the error state.

Here is the point in the code where we get it: https://github.com/sul-dlss/dor-services-app/blob/main/app/services/goobi_service.rb#L88-L94

And the default is that Example_Workflow Goobi is complaining about (which is used if no tag is found). I don't see any obvious logical problems with our code, but having another example in progress may help.

jmartin-sul commented 1 year ago

see e.g. https://app.honeybadger.io/projects/52894/faults/94132373

peetucket commented 1 year ago

Made an attempt here: https://github.com/sul-dlss/dor-services-app/pull/4467 but didn't seem to make a difference. No other ideas at the moment.

mjgiarlo commented 1 year ago

Closed by https://github.com/sul-dlss/argo/pull/4018