sul-dlss / common-accessioning

Suite of robots that handle the tasks of accessioning digital objects
Other
2 stars 1 forks source link

rescue for the correct exception for a retry #1321

Closed peetucket closed 3 months ago

peetucket commented 3 months ago

Why was this change made? 🤔

In order to correctly retry a situation where the object cannot be opened in start-ocr because it thinks it is still being accessioned, we need to rescue the correct exception.

Here is an example showing what is raised when this happens: https://app.honeybadger.io/projects/52894/faults/108881704

Dor::Services::Client::UnexpectedResponse: Unable to open version (Object net yet accessioned)

          exception_class = EXCEPTION_CLASS.fetch(response.status, UnexpectedResponse)
          raise exception_class.new(response: response,
                                    object_identifier: object_identifier,
                                    errors: data.fetch('errors', []))

This changes the rescue clause to catch the correct error: https://github.com/sul-dlss/dor-services-client/blob/main/lib/dor/services/client.rb#L43-L55

coming from https://github.com/sul-dlss/dor-services-client/blob/main/lib/dor/services/client.rb#L43-L55

Still no explanation for why this happens sometimes. This happened for a very large object for which a number of steps ran slowly, but that shouldn't matter I don't think: https://argo-qa.stanford.edu/view/druid:xh838rk3862

This is where the source exception is coming from i believe, basically dor-services-client calls DSA and asks it to open, and then .open calls ensure_openable! first, which throws the exception.

https://github.com/sul-dlss/dor-services-app/blob/main/app/services/version_service.rb#L157-L158

This calls https://github.com/sul-dlss/dor-services-app/blob/main/app/services/workflow_state_service.rb#L55-L59 which looks for the lifecycle of accessioned via the workflow service, which I guess is coming back as missing at first. Since all of the calls are to workflow-server-rails, once a completed step of end-accession is there, it should come back.

This looks potentially suspicious, but has to do with solr which I don't think should matter in our case: https://github.com/sul-dlss/workflow-server-rails/blob/main/app/controllers/steps_controller.rb#L57-L61