Closed jmartin-sul closed 1 year ago
so far i have looked for:
# ended up being faster for me to write this as a plain sql query, because i was pressed for time, and i'm less facile w/ AR than plain SQL
SELECT *
FROM "complete_moabs"
WHERE NOT EXISTS (
SELECT 1
FROM "zipped_moab_versions"
WHERE (complete_moabs.version = zipped_moab_versions.version)
LIMIT 1
)
# LIMIT 1
# no results as of 2019-12-20:
# makes sense, because AR hooks automatically call `create_zipped_moab_versions!` on create and update of CompleteMoab.
# it's an `after_create` hook on ZMV that triggers replication work
# ended up being faster for me to write this as a plain sql query, because i was pressed for time, and i'm less facile w/ AR than plain SQL
SELECT *
FROM "zipped_moab_versions"
WHERE NOT EXISTS (
SELECT 1
FROM "zip_parts"
WHERE zip_parts.id = zipped_moab_versions.id
LIMIT 1
)
# LIMIT 1
# lots of results as of 2019-12-20
ZipPart.unreplicated.count
SELECT COUNT(*) FROM "zip_parts" WHERE "zip_parts"."status" = $1 [["status", 1]]
# 5911 results as of 2019-12-20
once we're back from break, i'll plan to finish up this ticket by:
i can also try to write ActiveRecord versions of the first two queries, if people would find that useful. otherwise, i'll likely dump the IDs they generate to a file, and work off of that for the remediation steps i've listed above.
It's 5907 now ... but also as far as I can tell, there's nowhere near that many zip parts missing from all the s3 endpoints. In fact, it's pretty much zero. So I think there's an issue here with PresCat not correctly detecting valid zip parts on endpoints (or holding onto bad data).
Edit: Yeah, that's exactly what's happening. Spot-checking some of these 'status' = '1' zip_parts and I'm seeing them all on endpoints. Looks like zip_parts either didn't get correctly updated after a successful upload, or an audit that finds/fixes these parts didn't work right.
This ticket is now 3 years old.
The new dashboard code plus the CatalogToArchive audit code seems adequate to find replication errors. I'm closing this ticket; if that's the wrong thing to do, please re-open.
the rails 6 upgrade (#1270) inadvertently changed pres cat's queue adapter from
:resque
to rails' default (:async
) adapter.this lead to two problems:
see https://github.com/sul-dlss/preservation_catalog/blob/master/app/jobs/README.md for an illustration of the replication pipeline.
we should just find all of the
PreservedObject
s/CompleteMoab
s which aren't yet properly replicated, and trigger zip making for them. @julianmorley has an audit script that hits S3 to do this, but we could likely also identify such objects by querying pres cat.