nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
393 stars 403 forks source link

Use GATK small_exac_common_3.hg38.vcf.gz as default germline_resource #959

Open ameynert opened 1 year ago

ameynert commented 1 year ago

Description of feature

See https://github.com/broadinstitute/gatk/issues/7606 and #592.

In the GATK4 GetPileupSummaries code, the entire -V option populated by the sarek germline_resource parameter is read into memory. The current default for human hg38 is to use gnomad_af_only_hg38, which is huge and leads to Java heap out of memory errors. The request is to use the GATK file small_exac_common_3 file for this purpose instead. It's a subset of common variants found in gnomAD (https://gatk.broadinstitute.org/hc/en-us/community/posts/360067310872-How-to-find-or-generate-common-germline-variant-sites-VCF-required-by-GetPileupSummaries).

FriederikeHanssen commented 1 year ago

Hey! Just to clarify, do you want to use different germline_resource files for mutect and getpileupsummaries respectively? Otherwise it would be as simple as adding the file to igenomes and updating the germline_resource path. Just trying to understand how much work is needed here :D

ameynert commented 1 year ago

GetPileupSummaries: https://gatk.broadinstitute.org/hc/en-us/articles/9570416554907-GetPileupSummaries

The tool requires a common germline variant sites VCF, e.g. derived from the gnomAD resource, with population allele frequencies (AF) in the INFO field. This resource must contain only biallelic SNPs and can be an eight-column sites-only VCF.

Mutect2: https://gatk.broadinstitute.org/hc/en-us/articles/9570422171291-Mutect2

--germline-resource Population vcf of germline sequencing containing allele fractions. A resource, such as gnomAD, containing population allele frequencies of common and rare variants.

The above descriptions indicate they are for different purposes, so they should be different files.