sul-dlss / preservation2017

Story repo for preservation core work done summer/fall 2017
0 stars 0 forks source link

Story: General considerations for the nature of Preservation Core objects #21

Open LynnMcRae opened 6 years ago

LynnMcRae commented 6 years ago

Confirm the continued use of the moab specification as the principle schema for storing a Preservation Core object (see http://journal.code4lib.org/articles/8482)

Confirm the use of bagit bags as the container strategy for Archive objects

Confirm policy on checksums, that is the MD5, SHA-1, SHA-256 trio, and that all are present because all are checked. We have in the past added that file size shouldn't change as well.

julianmorley commented 6 years ago

I think ditching Moab is definitely out of scope for this sprint. :-)

But can we make design decisions that would enable SDR-PC to support other object types? Can we abstract the code enough to say an online object is something in a directory that has a globally unique name, and an archive object is something in a zip file with a globally unique name? ( Where right now, those somethings are "Moabs" and "Baggits of Moabs" or "zips of Moabs").

LynnMcRae commented 6 years ago

Agree that considering any changes to Moab are out of scope. I mainly placed it here so we remember to mention it as part of the enduring architecture. Bags are opaque whose content can vary over time and source. I wouldn't call them bags of moab (sounds biblical); we can distinguish them from other bags we might archive by talking about, say, PC vs DPN bags?

I'm not sure what you mean about abstracting Moab in the code since it needs direct and specific code to implement the Moab versioning strategy and object reconstruction via its forward-delta diff files, plus the handful of services that interpret specific metadata for DOR and accessioning/ingest. I hope/assume these are well articulated in the code as moab implementations. I can see abstracted vocabulary in the architecture picture -- the online copy, the master, the original, the primary PC representation, the raw object, the unwrapped, the unsullied? -- so that we can say Moab is the current implementation of that thing. And maybe some layer in the code in the same vein could have benefits there? I'm not feeling a good abstract term that sells it yet.