Open ameynert opened 1 year ago
Hey! Just to clarify, do you want to use different germline_resource files for mutect and getpileupsummaries respectively? Otherwise it would be as simple as adding the file to igenomes and updating the germline_resource
path.
Just trying to understand how much work is needed here :D
GetPileupSummaries: https://gatk.broadinstitute.org/hc/en-us/articles/9570416554907-GetPileupSummaries
The tool requires a common germline variant sites VCF, e.g. derived from the gnomAD resource, with population allele frequencies (AF) in the INFO field. This resource must contain only biallelic SNPs and can be an eight-column sites-only VCF.
Mutect2: https://gatk.broadinstitute.org/hc/en-us/articles/9570422171291-Mutect2
--germline-resource Population vcf of germline sequencing containing allele fractions. A resource, such as gnomAD, containing population allele frequencies of common and rare variants.
The above descriptions indicate they are for different purposes, so they should be different files.
Description of feature
See https://github.com/broadinstitute/gatk/issues/7606 and #592.
In the GATK4 GetPileupSummaries code, the entire -V option populated by the sarek germline_resource parameter is read into memory. The current default for human hg38 is to use gnomad_af_only_hg38, which is huge and leads to Java heap out of memory errors. The request is to use the GATK file small_exac_common_3 file for this purpose instead. It's a subset of common variants found in gnomAD (https://gatk.broadinstitute.org/hc/en-us/community/posts/360067310872-How-to-find-or-generate-common-germline-variant-sites-VCF-required-by-GetPileupSummaries).