populationgenomics / production-pipelines

Genomics workflows for CPG using Hail Batch
MIT License
2 stars 0 forks source link

Add resource override config option for genomics DB import memory #785

Closed EddieLF closed 3 weeks ago

EddieLF commented 3 weeks ago

Adds a config option to set the requested memory for GATK GenomicsDBImport jobs.


NOTE

Current code has this comment about the memory request:

    # The Broad: The memory setting here is very important and must be several
    # GiB lower than the total memory allocated to the VM because this tool uses
    # a significant amount of non-heap memory for native libraries.
    xms_gb = 8
    xmx_gb = 25

Then, the memory is requested from a STANDARD machine type:

    STANDARD.set_resources(
        j,
        nthreads=nthreads,
        mem_gb=xmx_gb + 1,
        storage_gb=20,
    )
  1. Should this PR also add in the option to request a HIGHMEM machine type instead of STANDARD?
  2. The requested memory for the VM is defined as xmx_gb + 1, so this config option is actually tweaking the java job memory. Is this fine? Would a value like xmx_gb = 63 make sense, meaning we request 64GB memory total for the job VM?
EddieLF commented 3 weeks ago

Also had to add in the HIGHMEM worker option because when requesting 64gb memory with STANDARD machine type, the driver will fail to create the batch: https://batch.hail.populationgenomics.org.au/batches/455413/jobs/1

ValueError: Requesting more cores than available on standard machine: 18>16

Adding in the HIGHMEM worker option resolved this. https://batch.hail.populationgenomics.org.au/batches/455415