sciencehistory / chf-sufia

sufia-based hydra app
Other
9 stars 4 forks source link

Processing jobs should clean up temp files when they're finished #350

Closed hackartisan closed 6 years ago

hackartisan commented 7 years ago

We could be uploading multiple hundreds of gigs per day. Our cleanup script can't keep up, especially since it cannot have any knowledge of the use of the files. If processing completes without error, temp files should be removed.

Some discussion of this in sufia included the possibility of creating an event that could be used for this purpose.

jrochkind commented 7 years ago

Am I correct: The processing mentioned here is NOT only about chf:create_derivatives, it's about general import to fedora, which I still don't personally understand how that happens, true?

hackartisan commented 7 years ago

I don't think this is related to fedora import; really just derivatives creation.

There are two derivatives creation cases, though: 1) when uploaded through the UI, the file is saved in carrier wave's tmp location, then imagemagick intermediaries are saved in tmp. carrier wave cleans up after itself, but the derivatives creation script does not. 2) when running derivatives generation on objects that are already in fedora (i.e., re-running the derivatives), the objects are pulled out of fedora to a different tmp location, then intermediaries saved in tmp as above. Neither of these operations is cleaning up after itself.

Either way, I think this has been de-prioritized. Just wanted to note here to resolve the thread.

jrochkind commented 7 years ago

Cool, we should be able to easily fix that, since the derivatives generation script is in ruby. (famous last words)

hackartisan commented 7 years ago

related: https://github.com/samvera/hyrax/issues/1571

jrochkind commented 6 years ago

fixed for derivatives. may be happening for other jobs, but, not a priority right now.