Background:
The investigation initiated from a post on Zulip, where the distinction between worker jobs and driver jobs became apparent. Upon examining the failed job, it was confirmed to be a driver job. Furthermore, there is flexibility to assign driver memory and cores independently of worker memory.
Proposal:
Currently, the worker memory setting is 'highmem,' which, unfortunately, did not address the failure. However, it did result in a notable cost reduction despite the same execution time.
It is proposed to quadruple the number of driver cores and assign them as 'highmem', while retaining the high worker memory configuration.
This adjustment aims to align the driver configuration more closely with the settings used in Dataproc, specifically targeting n1-highmem-8 instances.
The proposed changes were conducted in the following links. The driver jobs now have an allotted 27.9Gb of memory.
Batch link here, with worker jobs here.
Should the default large_cohort.toml be changed to have the following defaults in addition to highmem_workers = true?:
This PR proposes adjustments to the configuration of driver cores and memory allocation to prevent OOM issues.
Slack thread
Background: The investigation initiated from a post on Zulip, where the distinction between worker jobs and driver jobs became apparent. Upon examining the failed job, it was confirmed to be a driver job. Furthermore, there is flexibility to assign driver memory and cores independently of worker memory.
Proposal:
The proposed changes were conducted in the following links. The driver jobs now have an allotted 27.9Gb of memory. Batch link here, with worker jobs here.
Should the default
large_cohort.toml
be changed to have the following defaults in addition tohighmem_workers = true
?: