Added a config option to use HIGHMEM machines for the GenotypeGVCFs jobs.
High memory workers should not be necessary if the correct scatter count is chosen from the start of the workflow.
However, if you are part-way through a joint callset and some jobs are failing and need more memory, this might be needed to get the failing jobs to succeed on a re-run.
Changed the scatter count values in resources.py.
For the most part this has been set with the workflow.scatter_count config option, so these defaults have not been used. But because they serve as a reference I've updated them and added some comments about about our experiences with sharding the joint genotyping jobs at scale.
Added a config option to use HIGHMEM machines for the GenotypeGVCFs jobs.
High memory workers should not be necessary if the correct scatter count is chosen from the start of the workflow. However, if you are part-way through a joint callset and some jobs are failing and need more memory, this might be needed to get the failing jobs to succeed on a re-run.
Changed the scatter count values in resources.py.
For the most part this has been set with the
workflow.scatter_count
config option, so these defaults have not been used. But because they serve as a reference I've updated them and added some comments about about our experiences with sharding the joint genotyping jobs at scale.