populationgenomics / production-pipelines

Genomics workflows for CPG using Hail Batch
MIT License
5 stars 1 forks source link

GCP Housekeeping jobs? #617

Open MattWellie opened 8 months ago

MattWellie commented 8 months ago

We have a number of jobs which nominate a temporary data directory, and write potentially huge amounts of data to them (e.g. AnnotateCohort which generates multiple checkpoints, or GATK-SV in general which is a storage-hungry beast).

I would like to consider follow-on jobs which would run if the main job completes successfully, clearing the temporary storage directory which was used.

We'd need to poke the numbers, and see if this is worth the effort.

vivbak commented 6 months ago

We'd need to poke the numbers, and see if this is worth the effort.

I agree with this. I'm scared of deleting things in the pipeline :)

Tag this under: Investigate?

vivbak commented 6 months ago

Also tag this under cost optimisation