Closed christinklez closed 6 months ago
@christinklez I realized that the change we're making to not add items if they already exist (https://github.com/ucldc/rikolti/issues/926) affects this issue. These 12 records will likewise not be created because they have already been added to the index at some point.
@christinklez Can you try rerunning the create_stage_index
task for this collection? It should give WARNING - document already exists; not creating.
messages for those 12 records, but the task should succeed (assuming there aren't any other problems). Then you can look at the collection on stage and see where those 12 records ended up, i.e. are they part of the right object.
Yes! It's actually still churning through content_harvesting
right now. I'll update you once it makes it to create_stage_index
. Thank you!!
@barbarahui This collection finished through and is now on -stage: https://calisphere-stage.cdlib.org/collections/26713/
Looking at the create_stage_index
log, it doesn't have any conflicting ID messages anymore: https://7a8067cb-3b99-477e-a883-7e311175a9b4.c3.us-west-2.airflow.amazonaws.com/log?dag_id=harvest_collection&task_id=create_stage_index&execution_date=2024-05-09T21%3A10%3A23%2B00%3A00
I think this is fine, since I didn't find any ARK ID conflicts in the Nuxeo records themselves.
Brief Summary
Rikolti is saying there are 12 ARK conflicts, but when I look into the records in Nuxeo, I don't observe duplicate ARKs.
When I searched Nuxeo for each of these ARKs, they seem to only be assigned to one record.
All ARK conflicts are found within the project folder
/S01185
: https://nuxeo.cdlib.org/nuxeo/nxdoc/default/a89b0165-74e7-451a-a04f-3f7430ccf792/view_documentsBy the way, in a prior harvesting attempt, Rikolti wasn't picking up all the objects due to the deeply nested folder structure. Some changes were made.
PS: This error predates the mega merged index. So the ARK errors are being reported from within collection 26713
Harvesting Details
create_stage_index
: https://7a8067cb-3b99-477e-a883-7e311175a9b4.c3.us-west-2.airflow.amazonaws.com/log?dag_id=harvest_collection&task_id=create_stage_index&execution_date=2024-05-02T01%3A38%3A00%2B00%3A00Here is the list of ARKs and the corresponding Nuxeo object
ark:/81235/d8gt5ff52
https://nuxeo.cdlib.org/nuxeo/nxdoc/default/018c9681-93ba-4083-9e37-ce9d878f20ba/view_documentsark:/81235/d8b27q040
https://nuxeo.cdlib.org/nuxeo/nxdoc/default/b6192adb-10ad-4b3b-af56-6f604bd9a669/view_documentsark:/81235/d8qb9vd6x
https://nuxeo.cdlib.org/nuxeo/nxdoc/default/627ae451-28cb-4b73-92cb-dddf4fe2f1ba/view_documentsark:/81235/d8kk94m3n
https://nuxeo.cdlib.org/nuxeo/nxdoc/default/d9ca4271-8c37-445f-be86-5e9bbf0d96ea/view_documentsark:/81235/d8t43jb00
https://nuxeo.cdlib.org/nuxeo/nxdoc/default/0b2800c5-7849-4b24-8b94-034ccc1f2077/view_documentsark:/81235/d8ft8ds9w
https://nuxeo.cdlib.org/nuxeo/nxdoc/default/ee6e4fce-cccb-4a73-8c8e-ccd953773264/view_documentsark:/81235/d8v11vt8c
https://nuxeo.cdlib.org/nuxeo/nxdoc/default/168413e6-47e0-4f05-9d63-fae75fa616ea/view_documentsark:/81235/d8pc2th7q
https://nuxeo.cdlib.org/nuxeo/nxdoc/default/7334a594-28e6-4ee6-898b-579fb13ce5ce/view_documentsark:/81235/d86970607
https://nuxeo.cdlib.org/nuxeo/nxdoc/default/fbc85600-fabc-4f14-881a-28af0dfed5fb/view_documentsark:/81235/d8xs5jr2k
https://nuxeo.cdlib.org/nuxeo/nxdoc/default/0904382c-bc1b-445f-9053-f251671085f5/view_documentsark:/81235/d8d50fx2t
https://nuxeo.cdlib.org/nuxeo/nxdoc/default/5fe8cab7-384b-4788-acdc-6fba97eb3d07/view_documentsark:/81235/d82j68c6g
https://nuxeo.cdlib.org/nuxeo/nxdoc/default/a48ce5af-1489-4391-b549-6377c7056f5a/view_documentsHere is the error log from
create_stage_index
Exception: 12 errors in bulk indexing 12 records: ['[ark:/81235/d8gt5ff52]: version conflict, document already exists (current version [1])', '[ark:/81235/d8b27q040]: version conflict, document already exists (current version [1])', '[ark:/81235/d8qb9vd6x]: version conflict, document already exists (current version [1])', '[ark:/81235/d8kk94m3n]: version conflict, document already exists (current version [1])', '[ark:/81235/d8t43jb00]: version conflict, document already exists (current version [1])', '[ark:/81235/d8ft8ds9w]: version conflict, document already exists (current version [1])', '[ark:/81235/d8v11vt8c]: version conflict, document already exists (current version [1])', '[ark:/81235/d8pc2th7q]: version conflict, document already exists (current version [1])', '[ark:/81235/d86970607]: version conflict, document already exists (current version [1])', '[ark:/81235/d8xs5jr2k]: version conflict, document already exists (current version [1])', '[ark:/81235/d8d50fx2t]: version conflict, document already exists (current version [1])', '[ark:/81235/d82j68c6g]: version conflict, document already exists (current version [1])']