Closed laurenb33 closed 2 weeks ago
I found a spreadsheet that lists the field in Ladybird as:
MDfive Checksum [fdid=306]
Screenshot of File Checksums in LB SQL:
Need to manually check a couple to see if they match the originals rather than the derivatives
@martinlovell Were you able to check out these checksums?
Not yet, I have a reminder but haven't gotten to it.
Here's the md5s and sometimes the sha256 from the c#_file tables. Some have "PRIMARY" and some "DERIVE" for the images. Depending on where we got the image, either may be a match. (Guess: If it's from Fedora, then it seems like it might be DERIVE. If we got the original image, then it migth be PRIMARY.)
The columns in the CSV are collection, OID, label, _md5, _sha256 from the c#_file tables.
Collection1.csv Collection2.csv Collection3.csv Collection4.csv Collection6.csv Collection7.csv Collection9.csv Collection10.csv Collection11.csv Collection12.csv Collection13.csv Collection14.csv Collection15.csv Collection16.csv
Checked a couple random oids:
c1_file._md5 1000151.tif does not match 1000234.tif does not match 1102347.tif matches (8070d7a362dba2e3250907a82647002d) md5 17171234.tif matches (a1fa1ceb6508d2c4136180a819bd4f6a) md5
c4_file._md5 14795346.tif matches (6fdd9c105be034e8ff4c5350f2aec760) 14795349.tif matches (cc4ad072a6e9915752c88174b6778d8a)
c9_file._md5 15479454.tif matches (54807b7d9dc34dcde655556b0f7bcc9b) 15479474.tif matches (c089cab30ac90b8a1b25c09f22ee8426)
c16_file._md5 and _sha256 11400908.tif matches both (30e22d740b0db98dc94b0d49aaceb41c) 11400912.tif matches both
So...started a little discouraging, but after that everything matched. For the ones that didn't match, the file size differs. (file size is also stored in ladybird).
Given that matching/ingesting all migrated content with/into Preservica in the most optimized way for DCS will likely be a long-term goal, we would like to move forward with using these checksums to verify migrated files.
I'll make new tickets for future work, since this ticket was technically just to "investigate."
While working on the Nightly Job Integrity check dev work, it was discovered that our migrated children in DCS don't have checksums! However, @mikeapp mentioned during standup on Friday, 7/19 that there may be checksums somewhere in Ladybird. This ticket is to investigate if there are checksums in Ladybird? If there are, where are they and how could they be imported to DCS? These questions are also listed below in the acceptance criteria.
Acceptance Please Investigate/answer the following questions: