nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
400 stars 404 forks source link

[BUG] Mutect2 - Error with both 'intervals' and 'no-intervals' options #359

Closed YussAb closed 2 years ago

YussAb commented 3 years ago

Dear sarek community, I used the pipeline several times and I always found this bugs using Mutect2.

Description of the bug

GATK Mutect2

I found that among the pipeline there are two different bug using Mutect2 for somatic Tumor-Normal samples.

1- Mutect2 for Variantcalling using intervals option for parallelization; Even if the pipeline is executed correctly I get the following error;

**The exit status of the task that caused the workflow execution to fail was: null.

The full error message was:

No such property: variantcaller for class: Script_a7aea67c**


2- Mutect2 for Variantcalling with no intervals; there is a problem with the script in handling the "intervals (-L)" option:

**Error executing process > 'PileupSummariesForMutect2 (NIST7035_vs_NIST7086-no_intervals)'

Caused by: Process PileupSummariesForMutect2 (NIST7035_vs_NIST7086-no_intervals) terminated with an error exit status (1)

Command executed: gatk --java-options "-Xmx25g" GetPileupSummaries
-I NIST7035.recal.bam -V somatic-hg38_af-only-gnomad.hg38.vcf.gz
-O no_intervals_NIST7035_pileupsummaries.table

Command exit status: 1

A USER ERROR has occurred: Argument intervals was missing: Argument 'intervals' is required.**

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line:

nextflow run main.nf --input /hpcshare/genomics/sarek_analyses/results/Preprocessing/TSV/recalibrated.tsv -profile base,singularity --step variantcalling --tools mutect2 --pon /homenfs/yabili/gatk_bundle/resources/somatic-hg38_1000g_pon.hg38.vcf.gz --pon_index /homenfs/yabili/gatk_bundle/resources/somatic-hg38_1000g_pon.hg38.vcf.gz.tbi --germline_resource /homenfs/yabili/gatk_bundle/resources/somatic-hg38_af-only-gnomad.hg38.vcf.gz --germline_resource_index /homenfs/yabili/gatk_bundle/resources/somatic-hg38_af-only-gnomad.hg38.vcf.gz.tbi

(to get the second error I disabled the intervals option in nextflow.config)

Nextflow Installation

Container engine

nservant commented 3 years ago

Hi, I have the same issue. I guess the -L option is mandatory.

Using the germline ressources fix the bug

intervalsOptions = params.no_intervals ? "-L ${germlineResource}" : "-L ${intervalBed}"

Best N

nservant commented 3 years ago

A quick follow up on that. It seems that providing -L ${germlineResource} requires a hugh amount of RAM ... At least for exome data analysis, I think it would be better to use ;

intervalsOptions = params.no_intervals ? params.target_bed ? "-L ${params.target_bed}" : "-L ${germlineResource}" : "-L ${intervalBed}"

https://gatk.broadinstitute.org/hc/en-us/community/posts/360074629231-GetPileupSummaries-following-Mutect2-what-inputs-are-suitable-for-intervals-L-

priesgo commented 3 years ago

This is related to #299

priesgo commented 3 years ago

In the GATK bundle they provide this ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/GetPileupSummaries/small_exac_common_3.hg38.vcf.gz.tbi

That one is just 1.3 MB. I need to test it, but I believe this may solve the memory issue.

priesgo commented 3 years ago

I ran some samples (in another Mutect2 pipeline...) which were failing due to a memory issue in GetPileUpSummaries using this small_exac_common_3.hg38.vcf.gz and they went through. I am unsure though how this may impact the results...

FriederikeHanssen commented 2 years ago

Closed by #592