harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
69 stars 32 forks source link

GATK starts before all bam files are ready #151

Closed qt37t247 closed 8 months ago

qt37t247 commented 8 months ago

Dear developers,

I have 18 samples in my project and it seems GATK starts once a few alignments (bam files) are ready.

Is it possible to start the GATK (variants calling) once all the alignments are done?

Many thanks and best regards,

Qian

cademirch commented 8 months ago

Hi Qian,

Could you share some logs and your configuration files? Also, is it possible that GATK HaplotypeCaller is starting for those samples? HaplotypeCaller operates on individual samples, joint genotyping (GenotypeGVCFs) has to wait for all of the HaplotypeCaller jobs.

Cade

qt37t247 commented 8 months ago

Hi Cade,

Thank you for your prompt reply.

Yes, I meant that could HaplotypeCaller, which operates on individual samples, starts after all the bam files are done?

It seems HaplotypeCaller only uses 1 thread per sample in my system ("WARN IntelPairHmm - Using 1 available threads, but 4 were requested"). To utilize all the threads I requested, I'd like to try start HaplotypeCaller for all the samples at the same time.

Please kindly see my log and config files attached.

Best regards,

Qian 2024-01-08T152519.261691.snakemake.log config.zip

tsackton commented 8 months ago

Hi Qian,

HaplotypeCaller runs as soon as a single bam is finished. This step generates a single-sample gVCF file. Then, once all the gVCF files are finished, the GenotypeGVCFs step does joint calling across all individuals to produce the final vcf. This is expected behavior.

The HaplotypeCaller step is parallelized by splitting the genome into intervals and running each interval at the same time, not by using multiple threads for each run. So again the expectation is that each HaplotypeCaller job should use a single thread.

Our manuscript has more details on the pipeline design.

Tim

qt37t247 commented 8 months ago

Hi Tim,

Thank you very much for the clarification.

Best regards,

Qian