populationgenomics / production-pipelines

Genomics workflows for CPG using Hail Batch
MIT License
2 stars 0 forks source link

Adjusting Driver Cores and Memory Allocation for Improved Performance and Cost Reduction #724

Closed michael-harper closed 1 month ago

michael-harper commented 2 months ago

This PR proposes adjustments to the configuration of driver cores and memory allocation to prevent OOM issues.

Slack thread

Background: The investigation initiated from a post on Zulip, where the distinction between worker jobs and driver jobs became apparent. Upon examining the failed job, it was confirmed to be a driver job. Furthermore, there is flexibility to assign driver memory and cores independently of worker memory.

Proposal:

The proposed changes were conducted in the following links. The driver jobs now have an allotted 27.9Gb of memory. Batch link here, with worker jobs here.

Should the default large_cohort.toml be changed to have the following defaults in addition to highmem_workers = true?:

[workflow]
highmem_driver = true
driver_cores = 4