A whole bunch of updates that are complete enough and should probably be merged into main since we're now at a point of running this on-site and deployments for deletion are mostly ready to be automated. Summary of all the changes:
DataPackage and cleanup_level2
datapkg_completion module has a DataPackage class I built to work with both G3tSmurf, G3tHK, and Imprinter at the same time to go through and do the checks necessary to make sure we're ready to delete data from each timecode.
re-write cleanup level 2 to use the DataPackage functions and to run deletion at three different phases
Phase 1 - Completion: go through a timecode folder and make sure every single book that should exist does (planned for a 14 day lag)
Phase 2 - delete staged files as long as completion is True (planned for a 14 day lag)
Phase 3 - level 2 files. When one month has passed and there are at least two book copies (one on-site or not) that have the same checksum. (planned for a 28 day lag)
Bookbinder
Add ability to bind books without HK files if flags are set correctly
Create a TimeCodeBinder in bookbinder.py to bind the timecode books. This replaces a smaller fake binder that was being used in imprinter. This binder has some compression capability that is used for smurf books (but not enough to go through and change operations books). This also had to include moving where some functions are defined / called for the book metadata. No changes were made to the output obs/oper book metadata
Imprinter
updates to Imprinter to check with librarian about the status of off-site books
new functionality for finding which books belong in stray books, but resulting books should still be the same
Add a new schema column to the book table to track schema of the different books. This is now used for smurf books since we have a new compressed version that is schema=1
get_files_for_book now works for all book types
function to find if there are any level 2 observations that are not registered into books
load_smurf
tweaks to how the file and observation deletion work
G3tSmurf now tracks it's instance of G3tHK
functions to find files on disk that are not in the database and that are in the database but are not linked to level 2 observations.
G3tHK
Significantly reduces the size of the g3thk databases, now we have file entries but only the necessary smurf-related agents get added to hkagents and hkfields
Databases on-site were purged of all the non-UFM related fields.
A whole bunch of updates that are complete enough and should probably be merged into main since we're now at a point of running this on-site and deployments for deletion are mostly ready to be automated. Summary of all the changes:
DataPackage and cleanup_level2
datapkg_completion
module has aDataPackage
class I built to work with both G3tSmurf, G3tHK, and Imprinter at the same time to go through and do the checks necessary to make sure we're ready to delete data from each timecode.DataPackage
functions and to run deletion at three different phasesBookbinder
smurf
books (but not enough to go through and change operations books). This also had to include moving where some functions are defined / called for the book metadata. No changes were made to the output obs/oper book metadataImprinter
smurf
books since we have a new compressed version that is schema=1get_files_for_book
now works for all book typesload_smurf
G3tHK
Imprinter CLI