Story: Create a fixity verification process

LynnMcRae commented 6 years ago

This is the second level of audit, an fixity check against storage copes, comparing their checksums agains those in the Trusted Checksum Repository.

Audits will need to report findings

as current status information in the PC Inventory
as a finding in Provenance information

Negative findings will need to trigger a recovery process.

julianmorley commented 6 years ago

Fixity is called by three different processes, under three different conditions.

The Audit process calls Fixity to periodically check the validity of online Moabs.
The Archive process calls Fixity to validate an online Moab before it is sent to a replication endpoint.
The Recovery process calls Fixity to validate a recovered archive Moab before it is moved into the online Moab object store.

julianmorley commented 6 years ago

When we extend this code to include Ingest/Deposit work, Fixity may be called by that process to verify that the incoming content is valid.

ndushay commented 6 years ago

Audit service consumer: checks OIS for Moabs, and runs fixity checks on those Moabs

"runs fixity checks"

look in manifest files
pull up file checksums ....

ndushay commented 6 years ago

When an "archive object'** is bagged/zipped, it is checksummed; the checksums are generated and need to be put in TCR.

(A later fixity check against achived objects will regenerate checksum and compare to TCR)

**"archived object" will be "containerized" (e.g. tarred, gzipped, compacted) and there will be a checksum in the TCR for the actual "archived object" as well as for the goodies inside it.

Thus: for a single id

there will be a checksum for each "archived object" (note: each specific object version should have the same checksum regardless of replication location)
- "archived objects" (and their checksums) will be different if they are for different versions ~~- there will be checksums for goodies inside the moab object (which is NOT the "archived object")~~

our fixtures: exploded moab object

ndushay commented 6 years ago

Moab is NOT containerized.

Every version directory should have a manifest directory; one of the files in here will have checksums (manifest_inventory.xml)

manifest_inventory.xml contains all the checksums for all the other files in the manifest directory for that version. (5-6 diff files).

one of these files is signatureCatalog.xml - contains the checksums for all the files ...

... there is info somewhere about where all the checksums live in the Moab object ...

In order to perform fixity, we need to compute checksums on all files in the Moab object and compare the computed checksums against stored checksums.

ndushay commented 6 years ago

Work chunks:

Fixity Checking:

compute checksums (md5, ...) of files from Moab (or "archive object')
compare checksums to something (first pass: what is in the Moab files)

ndushay commented 6 years ago

Can temporarily punt on "where is the fixity info" by loading fixture data into test db, or into spec files, or ...

Can initially implement fixity as:

a gem that can be called by Audit process, by Archive process, by Recovery process
gem that runs fixity checks on fixture data (?) or maybe it can just be part of gem.

ndushay commented 6 years ago

Question: should fixity "gem" actually be code in Moab gem?

is it useful to have a gem do checksum verification separate from Moab objects?
- note that "archive objects" are not moab objects

Question: is there already a gem that does the fixity checks that we need?

ndushay commented 6 years ago

We will need to parse XML files from Moab object in order to get out checksum information. Perhaps the moab object traversal and pulling checksums out of its xml files should be part of moab gem.

ndushay commented 6 years ago

How much fixity checking do we actually need? How much for Moab object vs. archive object?

Can we trust the internal fixity of a Moab object? that is, if the moab object's individual file fixity is good ... can we just trust the overall checksum of the whole object without having to further go after individual files for verification?

ndushay commented 6 years ago

For online moab - initial

fixity check against manifest_inventory.xml only

For archived moab - initial

fixity check for single file of bag/tar/foo

ndushay commented 6 years ago

How is Moab generating checksums now? Perhaps that code is relevant.

Should it be split out so archive object fixity checks can use it? Is it already split out?

sul-dlss / preservation2017

Story: Create a fixity verification process #27