nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
350 stars 387 forks source link

Can be tumor-only somatic variant calling performed using Strelka? #803

Open ariadnaaterrades opened 1 year ago

ariadnaaterrades commented 1 year ago

Description of the bug

As you mention in your Nextflow pipeline: @https://nf-co.re/sarek/3.0.2/parameters it would be possible to obtain the results of tumor-only somatic variant calling using strelka tool (among other methods).

I've run the pipeline using strelka with tumor-only samples and I've ended up having results for germline variants (variants.vcf.gz, genome.S${N}.vcf.gz) instead of somatic ones (somatic.snvs.vcf.gz, somatic.indels.vcf.gz) (@https://github.com/Illumina/strelka/blob/v2.9.x/docs/userGuide/README.md ).

I read that strelka can perform somatic variant calling when you have tumor-normal samples pairs, however it is not possible to do so when you have tumor-only samples. @https://academic.oup.com/bioinformatics/article/28/14/1811/218573. Did I miss any extra parameter?

Sorry in advance if i misunderstood something.

Command used and terminal output

No response

Relevant files

No response

System information

No response

FriederikeHanssen commented 1 year ago

Hi @ariadnaaterrades ! Can you please post some more details: command run, samplesheet, any other configs/param files you are using, and the nextflow log?

ariadnaaterrades commented 1 year ago

Hi @FriederikeHanssen , thanks for your quick reply.

I've been using mainly the default parameters with GRCh38 genome version , here is the command run:

cd /DNA/Batch_01 srun --mpi=none --mem=40G --nodes=4 --ntasks-per-node=2 --partition=long --pty bash -i

module load Java/13.0.2 module load singularity/3.7.3

./nextflow run nf-core/sarek --input ./samplesheet_10102022.csv --outdir /DNA/Batch_01/ --genome GATK.GRCh38 -profile singularity --wes --intervals /DNA/xgen-exome-hyb-panel-v2-targets-hg38.bed --tools mutect2,freebayes,cnvkit,strelka

The sample sheet looks like this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

patient | status | sample | lane | fastq_1 | fastq_2 -- | -- | -- | -- | -- | -- Sample1 | 1 | Sample1_Tumor | 1 | /DNA/Batch_01/Sample1_R1_001.fastq.gz | /DNA/Batch_01/Sample1_R2_001.fastq.gz Sample2 | 1 | Sample2_Tumor | 1 | /DNA/Batch_01/Sample2_R1_001.fastq.gz | /DNA/Batch_01/Sample2_R2_001.fastq.gz Sample3 | 1 | Sample3_Tumor | 1 | /DNA/Batch_01/Sample3_R1_001.fastq.gz | /DNA/Batch_01/Sample3_R2_001.fastq.gz Sample4 | 1 | Sample4_Tumor | 1 | /DNA/Batch_01/Sample4_R1_001.fastq.gz | /DNA/Batch_01/Sample4_R2_001.fastq.gz Sample5 | 1 | Sample5_Tumor | 1 | /DNA/Batch_01/Sample5_R1_001.fastq.gz | /DNA/Batch_01/Sample5_R2_001.fastq.gz Sample6 | 1 | Sample6_Tumor | 1 | /DNA/Batch_01/Sample6_R1_001.fastq.gz | /DNA/Batch_01/Sample6_R2_001.fastq.gz Sample7 | 1 | Sample7_Tumor | 1 | /DNA/Batch_01/Sample7_R1_001.fastq.gz | /DNA/Batch_01/Sample7_R2_001.fastq.gz

And the thing is that there is no error in the execution_report_2022-10-11_13-00-30.html file...all the processes seem to be complete correctly. I did end up having output files, but not the somatic ones. For that reason I decided to look at the strelka tool information and I realized that it might not be possible to run tumor-only somatic variant calling.

FriederikeHanssen commented 1 year ago

Hey! Apologies for the late reply, I was OOO, but back now. So strelka is run with the following command for tumor-only samples:

https://github.com/nf-core/sarek/blob/master/modules/nf-core/modules/strelka/germline/main.nf

So with the "germline" module which is the only one we have available now as you correctly noted. However, there should still be the respective output present for you sample names. If that is the case, then the output is as expected and we can close the issue. The somatic ones can only be called with paired samples to my knowledge.

ariadnaaterrades commented 1 year ago

Hi!

As you mention, I ended up having germline results instead of somatic ones. However, I decided to run strelka among other methods to perform tumor-only somatic variant calling because in your website indicates that it would be possible to do so. https://nf-co.re/sarek/3.0.2/parameters

image

As you said, according to strelka's paper: https://academic.oup.com/bioinformatics/article/28/14/1811/218573 it is not possible to run it but when I saw your website information I thought that you may have implemented some changes in order to be capable of running it.

I think that changing the website information could prevent other people from misunderstanding.

Thank you so much!