Open sitems opened 3 months ago
task.cpus
is not set here: https://github.com/nf-core/modules/blob/3e403b703c04d4af6bddb4f0b03b772b7365ffc0/modules/nf-core/gatk4/haplotypecaller/main.nf#L42
Do you know which Haplotypecaller tool parameter would enable that?
I have also tried not setting those config parameters and a lot of other things, but the problem is still the same. In no way I can achieve haplotypecaller parallelisation. If I understand it correctly, providing intervals should cause haplotypecaller to run on those intervals in parallel. Or am I wrong?
Depends on what you mean here. The intervals will allow sarek to spin up a bunch of independent haplotypecaller jobs. Then each of those could use one or more threads.
From your description I assumed the latter is not working as you expect. For that in general each tool as a parameter set that let's you specify the number of cpus for that particular job. I can see that in the Haplotypecaller module this is not set and I am not sure which of the Haplotypecaller parameters would correspond to that: https://gatk.broadinstitute.org/hc/en-us/articles/27007962724507-HaplotypeCaller
Thank you for the response. I meant the first thing, to "to spin up a bunch of independent haplotypecaller jobs", but in htop, I do not see any parallelisation. I first run the pipeline on 40 samples without any --intervals - It took 5 days, and most of that time it was running haplotypecaller on 1-2 cores. That is why I decided to experiment with many things/settings/alternatives but I still cannot achieve any parallelisation. So how can I speed up haplotypecaller part of pipeline if I have 24 cores?
These process-level resource requests you showed are done on a per job basis.
If you request 20 CPUS for one job, those are requested and blocked by Nextflow for a single job and another job requesting the same resources won't have space resulting in one Haplotypecaller job being submitted after the other. Have you tried requesting fewer?
Yes, I have also tried using no custom nextflow.config at all (so only defaults, and '--max_cpus 23' parameter from CLI), but the same problem - no parallelisation.
How much memory have you been requesting for the jobs? If you want to do small test you could set
withName: 'GATK4_HAPLOTYPECALLER' {
cpus = 2
memory = 2.GB
}
This will likely fail with OOM but not the point here. also for testing it might make sense to remove all the other tools. This should reduce the over all number of jobs that the pipeline is submitting. Since if other jobs are submitted and using up resources it will also appear as if things are iterative. You can also check the produced timeline to see when which job has become available
Hey! Has this been resolved?
Hi Friederike, not yet, but I'm working on it, so I will let you know.
Finally, these parameters work best for me, haplotypecalller is using multiple cores now:
withName: 'GATK4_HAPLOTYPECALLER' {
cpus = 1
memory = 20.GB
time = 30.h
ext.args = { "--native-pair-hmm-threads 1 -ERC GVCF" }
}
We can close the issue.
Description of the bug
As a minimal example, I am locally (on a system with 24 cores and 128 GB RAM) running joint germline with just two WES samples with this nextflow.config file
process { withName: 'FASTP' {cpus = 16 } withName: 'BWAMEM1_MEM|BWAMEM2_MEM' { cpus = { cpus = 22 } memory = 100.GB } withName: 'GATK4_HAPLOTYPECALLER' {
cpus = 20 memory = 120.GB
} }
For simplicity, as --intervals, I am using igenomes WGS wgs_calling_regions_noseconds.hg38.bed bed file (same file as sarek is using). I know that I should use special exon intervals, but If I understand it, this WGS intervals should also provide some kind of parallelisation (and when I tried exon intervals before, the parallelisation problem was the same).
When checking htop, FASTP and BWA are utilizing multiple cores, but haplotypecaller not (just 1-2 cores). Why?
Command used and terminal output
Relevant files
nextflow.zip
System information
Nextflow version: 24.04.4.5917 Hardware: Desktop PC, 24 cores, 128GB RAM Executor: local Container engine: Docker OS: Ubuntu 22.04 Version of nf-core/sarek: 3.4.{2,3}