nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
357 stars 391 forks source link

Clarify how to use Mutect2 with custom bam files #1468

Closed sci-kai closed 2 weeks ago

sci-kai commented 3 months ago

Description of the bug

Hi, I run into the following error with running mutect2 via sarek:

INFO:    Converting SIF file to temporary sandbox...
Using GATK jar /usr/local/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx29491M -XX:-UsePerfData -jar /usr/local/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar Mutect2 --input patient1-10A.converted.cram --input patient1-01A.converted.cram --output patient1-01A_vs_patient1-10A.mutect2.vcf.gz --reference GRCh38.d1.vd1.fa --intervals chr17_7651780-7697538.bed --tmp-dir . --f1r2-tar-gz patient1-01A_vs_patient1-10A.mutect2.f1r2.tar.gz --normal-sample patient1_patient1-10A
15:23:13.503 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
15:23:13.529 INFO  Mutect2 - ------------------------------------------------------------
15:23:13.531 INFO  Mutect2 - The Genome Analysis Toolkit (GATK) v4.4.0.0
15:23:13.531 INFO  Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
15:23:13.531 INFO  Mutect2 - Executing as wij20zul@hilbert166 on Linux v3.10.0-1160.49.1.el7.x86_64 amd64
15:23:13.531 INFO  Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v17.0.3-internal+0-adhoc..src
15:23:13.532 INFO  Mutect2 - Start Date/Time: April 15, 2024 at 3:23:13 PM GMT
15:23:13.532 INFO  Mutect2 - ------------------------------------------------------------
15:23:13.532 INFO  Mutect2 - ------------------------------------------------------------
15:23:13.532 INFO  Mutect2 - HTSJDK Version: 3.0.5
15:23:13.532 INFO  Mutect2 - Picard Version: 3.0.0
15:23:13.533 INFO  Mutect2 - Built for Spark Version: 3.3.1
15:23:13.533 INFO  Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:23:13.533 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:23:13.533 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:23:13.533 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:23:13.533 INFO  Mutect2 - Deflater: IntelDeflater
15:23:13.533 INFO  Mutect2 - Inflater: IntelInflater
15:23:13.533 INFO  Mutect2 - GCS max retries/reopens: 20
15:23:13.533 INFO  Mutect2 - Requester pays: disabled
15:23:13.534 INFO  Mutect2 - Initializing engine
15:23:14.107 INFO  FeatureManager - Using codec BEDCodec to read file chr17_7651780-7697538.bed
15:23:14.111 INFO  IntervalArgumentCollection - Processing 129382 bp from intervals
15:23:14.119 INFO  Mutect2 - Done initializing engine
15:23:14.149 INFO  Mutect2 - Shutting down engine
[April 15, 2024 at 3:23:14 PM GMT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=285212672
***********************************************************************

A USER ERROR has occurred: Bad input: Sample patient1_patient1-10A is not in BAM header: [patient1-01A-31W-A062-09, patient1-10A-01W-A062-09]

***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
INFO:    Cleaning up image...

It seems that there is something wrong with the mutect2 command in this configuration. It defines a normal sample which cannot be found by mutect2 (from the .command.sh):

#!/bin/bash -euo pipefail
gatk --java-options "-Xmx29491M -XX:-UsePerfData" \
    Mutect2 \
    --input patient1-10A.converted.cram --input patient1-01A.converted.cram \
    --output patient1-01A_vs_patient1-10A.mutect2.vcf.gz \
    --reference GRCh38.d1.vd1.fa \
     \
     \
    --intervals chr17_7651780-7697538.bed \
    --tmp-dir . \
    --f1r2-tar-gz patient1-01A_vs_patient1-10A.mutect2.f1r2.tar.gz --normal-sample patient1_patient1-10A

cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_MUTECT2:MUTECT2_PAIRED":
    gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
END_VERSIONS

Freebayes and strelka run fine without an error message.

I have the following configuration:

Can you help me fixing this problem?

Command used and terminal output

nf-params.json:

{
    "step": "variant_calling",
    "outdir": "results/",
    "wes": true,
    "intervals": "targets.bed",
    "tools": "msisensorpro,mutect2,freebayes,strelka",
    "genome": "GRCh38",
    "fasta": "GRCh38.d1.vd1.fa",
    "fasta_fai": "GRCh38.d1.vd1.fa.fai",
    "save_reference": true,
    "save_mapped": false,
    "igenomes_ignore": true,
}

samplesheet (example):

patient,status,sample,bam,bai
patient1,1,patient1-01A,sample1_wxs_gdc_realn.bam,sample1_wxs_gdc_realn.bai
patient1,0,patient1-10A,sample2_wxs_gdc_realn.bam,sample2_wxs_gdc_realn.bai

run command:

nextflow run nf-core-sarek_3.4.0/3_4_0 \
-profile singularity,hilbert \
-c config/run-sarek.config \
-params-file config/nf-params.json \
-plugins nf-validation@1.0.0 \
--input samplesheet.csv


### Relevant files

_No response_

### System information

nf-core sarek verison 3.4.0
cameroncordero commented 2 months ago

I had a similar issue on 3.4.1, where it gave me an error about not having a tumor-sample which was required by mutect2, I added all the possible columns (icluding sex) and it fixed the bug, it might not be parsing the sample table properly. Try with that and see if it fixes it for you. It is a different error, once I am back on my rig I can update this with the error log.

VeBeKay1 commented 1 month ago

Has this been resolved?

I have an almost identical issue, also starting with variant calling from aligned bam files, and Mutect2 is also complaining that a sample isn't in the bam header. Has anyone been able to resolve this?

My samplesheet contains patient,sex,status,sample,bam,bai

sci-kai commented 1 month ago

Hi, so I could resolve this error as following: The problem here is the mismatch between the sample names in the BAM header (when starting with BAM files) and the sample names specified in the mutect2 parameters defined in conf/modules/mutect2.config.

conf/modules/mutect2.config contains in the process scope:

        withName: 'MUTECT2_PAIRED' {
            ext.args   = { params.ignore_soft_clipped_bases ?
                                "--dont-use-soft-clipped-bases true --f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample ${meta.patient}_${meta.normal_id}" :
                                "--f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample ${meta.patient}_${meta.normal_id}" }
        }

I overwrote this in the process scope of a separate config file provided with -c so that is fits the sample names in my BAM files:

    withName: 'MUTECT2_PAIRED' {
            ext.args   = { params.ignore_soft_clipped_bases ?
                                "--dont-use-soft-clipped-bases true --f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample ${meta.normal_id}" :
                                "--f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample ${meta.normal_id}" }
    }

So you either change this config file to fit your BAM sample names or another solution would be to rename the BAM samplenames, e.g. with samtools, so it fits the fixed naming scheme in the mutect2.config file.

VeBeKay1 commented 1 month ago

Thank you so much! changing mutect2.config worked :-)

asp8200 commented 1 month ago

Is this also a problem with Sarek v3.4.2?

VeBeKay1 commented 1 month ago

Yes, I'm running nextflow run nf-core/sarek -r 3.4.2

FriederikeHanssen commented 2 weeks ago

Hi! YEs this is a known issue with mutect because it determines what the normal file is not by the file it is provided with but by this id parameter. I will make a note to update the docs to more clearly explain how to run mutect2 when files are not geenrated with sarek

FriederikeHanssen commented 2 weeks ago

There is a related issue #1455. I am closing this issue and keep tracking this feature there.