yalelibrary / YUL-DC

Preliminary issue tracking for Yale University Libraries Digital Collections project
3 stars 0 forks source link

Batch job for validating migrated images #2962

Open sshetenhelm opened 2 weeks ago

sshetenhelm commented 2 weeks ago

Story For migrated image files, we have checksums pulled from Ladybird (see #2900). We would like to use these checksums to validate migrated image files in DCS. This validation should occur via a nightly batch job. Ideally, it will be combined with the current validation job. On the individual batch process page, there should be an error message when an image does not pass validation.

Will likely need to complete #2963 prior to this ticket.

Acceptance

Message for checksum mismatch: The child oid # 007 has a checksum mismatch. The checksum of the image file saved to this child oid does not match the checksum of the image file in the database. This may mean that the image has been corrupted. Please verify integrity of image for child oid # 007 by manually comparing the checksum values and update record as necessary.

Message for file size=0: The child oid # 007 has a file size of 0. Please verify image for child oid # 007.

sshetenhelm commented 2 weeks ago

I think the sample message you've provided will work. Thank you!

K8Sewell commented 13 hours ago

Which checksum attribute will be 'source of truth'? ----- options: checksum, sha512_checksum, sha256_checksum, md5_checksum

Current handling of a child object checksum mismatch no action is currently taken on that child object. An error message is relayed to the user but no changes are made on the object. What if there was a default behavior or user initiated action to update the child object's checksum data if incorrect data is used or found - as is the case with many ladybird objects whose md5_checksum is known to have incorrect values? The child object's checksum could be updated based on the actual file checksum because we would need to pull that value for the comparison.