pulibrary / aspace_helpers

methods and reports to support common SC activities in ArchivesSpace
1 stars 0 forks source link

follow up on the discrepancy between orphaned do's in API and DB results #613

Closed regineheberlein closed 1 week ago

regineheberlein commented 1 week ago

When querying for orphaned digital objects via API and via DB, respectively, we got slightly divergent counts (see #604):

Most of the discrepancy is explained by inadvertent duplicates in the API result set. However, the following divergent results remain:

  1. returned via API, but not via DB: 00674e10-0408-456c-a385-675ca7124ae1 4d6d5732-1b33-4713-b033-19f2ae8d0807 84d62080-3bd2-457b-9fac-189f8b7fd57b e052db26-db8b-4016-ba65-1436ca60237a f2f296c3-8917-4446-ac03-dbe86995813b

These are false positives returned because the collection array returns empty even though they are linked to an ao. I'm not sure of the mechanism of that, but I suspect it is an artifact of the sequel_to_json mediation layer in ArchivesSpace.

  1. returned via DB, but not via API: 32ccc343-9d7a-48f3-a798-a86aee61e0b3 ark:/88435/8g84mw89p

These are true positives in that they have no file version records. However, the ark, which currently populates the identifier field, resolves, so this record can be corrected and removed from the data set.

After accounting for those discrepancies, both queries return 54,115 results that may be considered true orphans.

regineheberlein commented 1 week ago

This is a high enough number that I want to do some thorough spot-checking before I believe it.

regineheberlein commented 1 week ago

Checked against 57,346 figgy objects; only 584 matches found on object id.