Start with an object that already has accessioned OCR files from a previous accessioning run (e.g. full PDF and page level XML files).
Start ocrWF again via Argo using the "Text extraction" button
It will run completely, but it looks like the new files are not replacing the existing files.
I discovered this on this object: https://argo-qa.stanford.edu/view/druid:hj614hq2225 after I changed how page level XML is created, and I observed that while different page level XML files were being written to the abbyy output folder and then copied to the workspace, the new page level XML files were not present in the new version of the object nor did they make it to preservation. Only the new PDF was there.
Go digging around the moabs on disk on preservation-catalog-web-qa-01 and you will only see the updated PDF between versions, not the page level XML:
I discovered this on this object: https://argo-qa.stanford.edu/view/druid:hj614hq2225 after I changed how page level XML is created, and I observed that while different page level XML files were being written to the abbyy output folder and then copied to the workspace, the new page level XML files were not present in the new version of the object nor did they make it to preservation. Only the new PDF was there.
Go digging around the moabs on disk on
preservation-catalog-web-qa-01
and you will only see the updated PDF between versions, not the page level XML:/services-disk-qa/storage-root-2/sdr2objects/hj/614/hq/2225/hj614hq2225/