Closed guiguem closed 8 years ago
Hornet can run nearline analysis job, though that feature is not currently used. This would be the appropriate way to run this duplicate-checking script.
Check out the example config file at line 79. Hornet has a pool of "workers" that are available to run nearline analysis jobs on files. These workers operate in parallel, so multiple files can be analyzed at the same time.
You'll need to specify a job for this script. The file-type
argument should be one of the file types mentioned in the classifier
section (line 43). The command
will be run on ignatius by the project8 user.
You'll definitely want to do some testing to make sure you have the right number of workers, so that the analysis doesn't slow down the data flow.
I'm going to close this issue because the actual changes need to be made to the Hornet config file for ignatius.
hornet should be able to distinguish unique new RSA files from duplicates (which happens from time to time), before copying to warm and being pushed away... Luiz made a script to individually check these duplicates in scripts/rsamat/CheckDuplicateRSAMAT.py If there are duplicates, the duplicates files should be simply removed without transfer and a message should pop in Slack