Closed qt37t247 closed 8 months ago
Hi Qian,
Could you share some logs and your configuration files? Also, is it possible that GATK HaplotypeCaller is starting for those samples? HaplotypeCaller operates on individual samples, joint genotyping (GenotypeGVCFs) has to wait for all of the HaplotypeCaller jobs.
Cade
Hi Cade,
Thank you for your prompt reply.
Yes, I meant that could HaplotypeCaller, which operates on individual samples, starts after all the bam files are done?
It seems HaplotypeCaller only uses 1 thread per sample in my system ("WARN IntelPairHmm - Using 1 available threads, but 4 were requested"). To utilize all the threads I requested, I'd like to try start HaplotypeCaller for all the samples at the same time.
Please kindly see my log and config files attached.
Best regards,
Hi Qian,
HaplotypeCaller runs as soon as a single bam is finished. This step generates a single-sample gVCF file. Then, once all the gVCF files are finished, the GenotypeGVCFs step does joint calling across all individuals to produce the final vcf. This is expected behavior.
The HaplotypeCaller step is parallelized by splitting the genome into intervals and running each interval at the same time, not by using multiple threads for each run. So again the expectation is that each HaplotypeCaller job should use a single thread.
Our manuscript has more details on the pipeline design.
Tim
Hi Tim,
Thank you very much for the clarification.
Best regards,
Qian
Dear developers,
I have 18 samples in my project and it seems GATK starts once a few alignments (bam files) are ready.
Is it possible to start the GATK (variants calling) once all the alignments are done?
Many thanks and best regards,
Qian