populationgenomics / production-pipelines

Genomics workflows for CPG using Hail Batch
MIT License
2 stars 0 forks source link

Add HIGHMEM workers config option for GenotypeGVCFs jobs #786

Closed EddieLF closed 3 weeks ago

EddieLF commented 3 weeks ago

Added a config option to use HIGHMEM machines for the GenotypeGVCFs jobs.

High memory workers should not be necessary if the correct scatter count is chosen from the start of the workflow. However, if you are part-way through a joint callset and some jobs are failing and need more memory, this might be needed to get the failing jobs to succeed on a re-run.

Changed the scatter count values in resources.py.

For the most part this has been set with the workflow.scatter_count config option, so these defaults have not been used. But because they serve as a reference I've updated them and added some comments about about our experiences with sharding the joint genotyping jobs at scale.