sul-dlss / preservation2017

Story repo for preservation core work done summer/fall 2017
0 stars 0 forks source link

Story: Create a fixity verification process #27

Open LynnMcRae opened 6 years ago

LynnMcRae commented 6 years ago

This is the second level of audit, an fixity check against storage copes, comparing their checksums agains those in the Trusted Checksum Repository.

Audits will need to report findings

Negative findings will need to trigger a recovery process.

julianmorley commented 6 years ago

Fixity is called by three different processes, under three different conditions.

julianmorley commented 6 years ago

When we extend this code to include Ingest/Deposit work, Fixity may be called by that process to verify that the incoming content is valid.

ndushay commented 6 years ago

Audit service consumer: checks OIS for Moabs, and runs fixity checks on those Moabs

"runs fixity checks"

ndushay commented 6 years ago

When an "archive object'** is bagged/zipped, it is checksummed; the checksums are generated and need to be put in TCR.

(A later fixity check against achived objects will regenerate checksum and compare to TCR)

**"archived object" will be "containerized" (e.g. tarred, gzipped, compacted) and there will be a checksum in the TCR for the actual "archived object" as well as for the goodies inside it.

Thus: for a single id

our fixtures: exploded moab object

ndushay commented 6 years ago

Moab is NOT containerized.

Every version directory should have a manifest directory; one of the files in here will have checksums (manifest_inventory.xml)

manifest_inventory.xml contains all the checksums for all the other files in the manifest directory for that version. (5-6 diff files).

... there is info somewhere about where all the checksums live in the Moab object ...

In order to perform fixity, we need to compute checksums on all files in the Moab object and compare the computed checksums against stored checksums.

ndushay commented 6 years ago

Work chunks:

Fixity Checking:

ndushay commented 6 years ago

Can temporarily punt on "where is the fixity info" by loading fixture data into test db, or into spec files, or ...

Can initially implement fixity as:

ndushay commented 6 years ago

Question: should fixity "gem" actually be code in Moab gem?

Question: is there already a gem that does the fixity checks that we need?

ndushay commented 6 years ago

We will need to parse XML files from Moab object in order to get out checksum information. Perhaps the moab object traversal and pulling checksums out of its xml files should be part of moab gem.

ndushay commented 6 years ago

How much fixity checking do we actually need? How much for Moab object vs. archive object?

Can we trust the internal fixity of a Moab object? that is, if the moab object's individual file fixity is good ... can we just trust the overall checksum of the whole object without having to further go after individual files for verification?

ndushay commented 6 years ago

For online moab - initial

For archived moab - initial

ndushay commented 6 years ago

How is Moab generating checksums now? Perhaps that code is relevant.

Should it be split out so archive object fixity checks can use it? Is it already split out?