sul-dlss / common-accessioning

Suite of robots that handle the tasks of accessioning digital objects
Other
2 stars 1 forks source link

New OCR files are not being accessioned #1281

Closed peetucket closed 3 weeks ago

peetucket commented 4 weeks ago

I discovered this on this object: https://argo-qa.stanford.edu/view/druid:hj614hq2225 after I changed how page level XML is created, and I observed that while different page level XML files were being written to the abbyy output folder and then copied to the workspace, the new page level XML files were not present in the new version of the object nor did they make it to preservation. Only the new PDF was there.

Go digging around the moabs on disk on preservation-catalog-web-qa-01 and you will only see the updated PDF between versions, not the page level XML:

/services-disk-qa/storage-root-2/sdr2objects/hj/614/hq/2225/hj614hq2225/