populationgenomics / analysis-runner

MIT License
2 stars 4 forks source link

Untar files update #676

Open EddieLF opened 8 months ago

EddieLF commented 8 months ago

Automatically moves the tar file to a /completed folder upon successfully extracting its contents.

This means that if the batch has to be re-run on the same bucket path due to one or more job failures, the tarballs that were successfully extracted will no longer be picked up and queued for re-extraction.

jmarshall commented 8 months ago

Does it need to also look in the …/completed/… path when it's first downloading the tarball, so that it can stop cleanly if this tarball has already been done? At the moment, presumably on re-run it would throw an exception and fail. (ETA: Oh, I guess completed is a signal to the operator to not re-run it on that one! :smile:)

AIUI renaming — even within the same bucket — is not a free operation and is really a copy+delete. I'm probably overthinking this, but could we instead attach some metadata saying it's been done? (I haven't used this API so don't know if this is even feasible…)