nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
395 stars 402 forks source link

The sample-sheet only contains tumor-samples #1260

Closed bounlu closed 3 months ago

bounlu commented 1 year ago

Description of the bug

I am getting a new type of error when running the same sample samplesheet as it used to work before:

The sample-sheet only contains tumor-samples, but the following tools, which were requested by the option "tools", expect at least one normal-sample : ascat, msisensorpro
Files within --vep_cache invalid. Make sure there is a directory named homo_sapiens/110_GRCh38 in s3://annotation-cache/vep_cache/.
https://nf-co.re/sarek/dev/usage#how-to-customise-snpeff-and-vep-annotation
The sample-sheet only contains normal-samples, but the following tools, which were requested with "--tools", expect at least one tumor-sample : ascat, controlfreec, mutect2, msisensorpro

The sample sheet contains both normal and tumor samples though.

Command used and terminal output

nextflow run nf-core/sarek \
-latest \
-profile docker \
--step mapping \
--tools freebayes,mpileup,mutect2,strelka,manta,tiddit,ascat,cnvkit,controlfreec,msisensorpro,snpeff,vep,merge \
--input 'samplesheet_sarek.csv' \
--outdir '/nextflow/sarek/results/' \
-work-dir '/nextflow/sarek/work/' \
-c 'custom.config' \
-r master \
-resume

Relevant files

No response

System information

N E X T F L O W ~ version 23.08.1-edge local Docker Linux nf-core/sarek master v3.3.1

maxulysse commented 1 year ago

Can you share the .nextflow.log?

bounlu commented 1 year ago

Here you go:

nextflow.log

asp8200 commented 1 year ago

I guess it would also be relevant for us to inspect the sample sheet. Can you also share that with us?

maxulysse commented 1 year ago

Yeah, I think the issue is unrelated to that error. Can you try the same command and add --use_annotation_cache_keys?

acpicornell commented 1 year ago

Same error here using nextflow-23.04.4-all --use_annotation_cache_keys makes no difference. I am using normal samples (marked as 0 in the samplesheet) and I get:

The sample-sheet only contains tumor-samples, but the following tools, which were requested by the option "tools", expect at least one normal-sample : deepvariant, haplotypecaller
Process 'CNNSCOREVARIANTS' has been already used -- If you need to reuse the same component, include it with a different name or include it in a different workflow context
maxulysse commented 1 year ago

Can you share the log file please?

berguner commented 1 year ago

I am also getting the same error even though my sample sheet contains only normal samples (0), not tumor samples.

maxulysse commented 1 year ago

can you share the log file?

berguner commented 1 year ago

I also got the Files within --vep_cache invalid error and I fixed it by providing the VEP cache files for Ensembl 110. Strangely, this also fixed the The sample-sheet only contains tumor-samples error.

maxulysse commented 1 year ago

@bounlu Can you try without snpeff,vep,merge to see if I can rule out my hypothesis?

maxulysse commented 1 year ago

I also got the Files within --vep_cache invalid error and I fixed it by providing the VEP cache files for Ensembl 110. Strangely, this also fixed the The sample-sheet only contains tumor-samples error.

Yeah, I think the 2 are fully unrelated, but no idea why the The sample-sheet only contains tumor-samples gets triggered.

bounlu commented 1 year ago

Without snpeff,vep,merge I also don't get error.

acpicornell commented 1 year ago

Can you share the log file please?

nextflow.log

this is my "launcher":

export JAVA_HOME="/home/acpicornell/.sdkman/candidates/java/17.0.6-amzn"
export NXF_VER=23.04.4
/mnt/storage/$(whoami)/bin/nextflow run /mnt/storage/$(whoami)/pipelines/nf-core-sarek_3.3.1/3_3_1/\
     --max_memory       100.0GB\
     --max_cpus         32\
     --input            /mnt/storage/$(whoami)/nfcore/sarek/230908/samplesheet.csv\
     --outdir           /mnt/storage/$(whoami)/nfcore/sarek/230908/output\
     --step             mapping\
     --genome           GATK.GRCh38\
     --igenomes_base    /mnt/storage/$(whoami)/references\
     --tools            deepvariant,freebayes,haplotypecaller,mpileup,strelka,sentieon_haplotyper\
     --aligner          bwa-mem2\
     --seq_platform     ILLUMINA\
     --wes              false\
     -profile           singularity\

All my samples are germline and I am not annotating. This is what I get:

Process 'CNNSCOREVARIANTS' has been already used -- If you need to reuse the same component, include it with a different name or include it in a different workflow context

 -- Check script '/mnt/storage/acpicornell/pipelines/nf-core-sarek_3.3.1/3_3_1/./workflows/../subworkflows/local/bam_variant_calling_germline_all/../vcf_variant_filtering_gatk/main.nf' at line: 22 or see '.nextflow.log' file for more details
The sample-sheet only contains tumor-samples, but the following tools, which were requested by the option "tools", expect at least one normal-sample : deepvariant, haplotypecaller
maxulysse commented 1 year ago

ok, so https://github.com/nf-core/sarek/releases/tag/3.3.2 should fix these issues

bounlu commented 1 year ago

I still get similar error with the new version:

The sample-sheet only contains normal-samples, but the following tools, which were requested with "--tools", expect at least one tumor-sample : ascat, controlfreec, mutect2, msisensorpro
Files within --vep_cache invalid. Make sure there is a directory named homo_sapiens/110_GRCh38 in s3://annotation-cache/vep_cache/.
https://nf-co.re/sarek/usage#how-to-customise-snpeff-and-vep-annotation
The sample-sheet only contains tumor-samples, but the following tools, which were requested by the option "tools", expect at least one normal-sample : ascat, msisensorpro
maxulysse commented 1 year ago

ok, so we get all error messages at once, but I think it all lies with this error:

Files within --vep_cache invalid. Make sure there is a directory named homo_sapiens/110_GRCh38 in s3://annotation-cache/vep_cache/.
https://nf-co.re/sarek/usage#how-to-customise-snpeff-and-vep-annotation
maxulysse commented 1 year ago

in your case @bounlu I'd say that adding --use_annotation_cache_keys or using a local cache should solve your issue

sisterdot commented 11 months ago

hey!

also got the error

Files within --vep_cache invalid. Make sure there is a directory named homo_sapiens/110_GRCh38 in s3://annotation-cache/vep_cache/.
https://nf-co.re/sarek/usage#how-to-customise-snpeff-and-vep-annotation
The sample-sheet only contains tumor-samples, but the following tools, which were requested by the option "tools", expect at least one normal-sample : deepvariant, haplotypecaller

when starting

nextflow run  nf-core/sarek -r 3.3.2  --outdir results_sarek_3.3.2  --input parameter.csv --genome GATK.GRCh38 --tools cnvkit,deepvariant,freebayes,haplotypecaller,manta,mpileup,snpeff,vep,strelka,tiddit,merge 

in my understanding the error is linked to the check in if (params.vep_cache == "s3://annotation-cache/vep_cache") { which evaluates to false in my test case

as does if (params.snpeff_cache == "s3://annotation-cache/snpeff_cache") {

if i explicitly define the parameters (before tools) when starting the pipeline both comparisons will actually evaluate to true and no confusing error is raised (should be the same value as default, but maybe different type, a groovy expert might know):

nextflow run  nf-core/sarek -r 3.3.2 --snpeff_cache s3://annotation-cache/snpeff_cache --vep_cache s3://annotation-cache/vep_cache --outdir results_sarek_3.3.2  --input parameter.csv --genome GATK.GRCh38 --tools cnvkit,deepvariant,freebayes,haplotypecaller,manta,mpileup,snpeff,vep,strelka,tiddit,merge
  nf-core/sarek v3.3.2-gf034b73
  containerEngine           : singularity
  nextflow version 23.04.2.5871
  CentOS Linux release 7.9.2009
achakroun commented 11 months ago

Hello, It seems to be a conflict with "--tools" setting. I had the following (same) error with nf-core/sarek v3.3.2-gf034b73:

The sample-sheet only contains tumor-samples, but the following tools, which were requested by the option "tools", expect at least one normal-sample : deepvariant, haplotypecaller Process 'CNNSCOREVARIANTS' has been already used -- If you need to reuse the same component, include it with a different name or include it in a different workflow context

and it worked fine for me when I set:

--tools deepvariant,freebayes,sentieon_haplotyper,strelka,manta,tiddit,cnvkit

or

--tools deepvariant,haplotypecaller,freebayes,strelka,manta,tiddit,cnvkit

instead of:

--tools deepvariant,freebayes,haplotypecaller,sentieon_haplotyper,strelka,manta,tiddit,cnvkit

FriederikeHanssen commented 11 months ago

@achakroun can you send everything you used? Samplesheet, full command, the .nextflow.log, any custom configuration files?

achakroun commented 11 months ago

@achakroun can you send everything you used? Samplesheet, full command, the .nextflow.log, any custom configuration files?

Sure

1- Command:

nextflow run nf-core/sarek -profile singularity -resume --max_cpus 10 --max_memory 40.GB --input ./samplesheet.csv --trim_fastq --genome GATK.GRCh38 --save_reference --outdir ./results --wes --intervals Twist_Exome_RefSeq_targets_hg38_200-pad.bed --tools deepvariant,haplotypecaller,freebayes,strelka,manta,tiddit,cnvkit --concatenate_vcfs

2- Samplesheet was like:

patient,status,sample,lane,fastq_1,fastq_2 841-23,0,841-23,Lane2,fastq/841-23_1.fastq.gz,fastq/841-23_2.fastq.gz 842-23,0,842-23,Lane2,fastq/842-23_1.fastq.gz,fastq/842-23_2.fastq.gz

Nothing else.

FriederikeHanssen commented 11 months ago

can you send the .nextflow.logfile as well

achakroun commented 11 months ago

Please, note that the pipe is still running .nextflow.log

FriederikeHanssen commented 11 months ago

Sorry can you provide the log from the failed run?

FriederikeHanssen commented 11 months ago

ah pretty sure I know what is wrong. opened a separate issue to fix --> https://github.com/nf-core/sarek/issues/1314

sfilges commented 11 months ago

hey!

also got the error

Files within --vep_cache invalid. Make sure there is a directory named homo_sapiens/110_GRCh38 in s3://annotation-cache/vep_cache/.
https://nf-co.re/sarek/usage#how-to-customise-snpeff-and-vep-annotation
The sample-sheet only contains tumor-samples, but the following tools, which were requested by the option "tools", expect at least one normal-sample : deepvariant, haplotypecaller

when starting

nextflow run  nf-core/sarek -r 3.3.2  --outdir results_sarek_3.3.2  --input parameter.csv --genome GATK.GRCh38 --tools cnvkit,deepvariant,freebayes,haplotypecaller,manta,mpileup,snpeff,vep,strelka,tiddit,merge 

in my understanding the error is linked to the check in if (params.vep_cache == "s3://annotation-cache/vep_cache") { which evaluates to false in my test case

as does if (params.snpeff_cache == "s3://annotation-cache/snpeff_cache") {

if i explicitly define the parameters (before tools) when starting the pipeline both comparisons will actually evaluate to true and no confusing error is raised (should be the same value as default, but maybe different type, a groovy expert might know):

nextflow run  nf-core/sarek -r 3.3.2 --snpeff_cache s3://annotation-cache/snpeff_cache --vep_cache s3://annotation-cache/vep_cache --outdir results_sarek_3.3.2  --input parameter.csv --genome GATK.GRCh38 --tools cnvkit,deepvariant,freebayes,haplotypecaller,manta,mpileup,snpeff,vep,strelka,tiddit,merge
  nf-core/sarek v3.3.2-gf034b73
  containerEngine           : singularity
  nextflow version 23.04.2.5871
  CentOS Linux release 7.9.2009

this fixed the error for me.

FriederikeHanssen commented 11 months ago

Hey! Yeah those are two completely unrelated issues, just nf-validation both times bubbles up The sample-sheet only contains tumor-samples, but the following tools, which were requested by the option "tools", expect at least one normal-sample : deepvariant, haplotypecaller this for some reason. But the real hint is in the other half of the error message

FriederikeHanssen commented 11 months ago

@achakroun your issue is fixed on dev

charlesdavid commented 3 months ago

Hey, this problem has not been entirely fixed... I am running with a local genome withignore igenomes set and have a sample sheet with both status entered: 0 and 1. However, this error is still occurring making the workflow un-runnable for version: nf-core/sarek v3.4.2

nextflow.exception.WorkflowScriptErrorException: The sample-sheet only contains tumor-samples, but the following tools, which were requested by the option "tools", expect at least one normal-sample : haplotypecaller

COMMAND:

nextflow \ main.nf \ -c "pfr_profile.config" \ -profile pfr,singularity \ -params-file "pfr_params.json" \ --input ${INPUT} \ --outdir "${OUT_DIR}" \ --genome null \ --igenomes_ignore \ --wes true \ --skip_tools baserecalibrator \ --tools "freebayes,haplotypecaller,mpileup" \ --fasta ${FASTA} \ --fasta_fai ${FAI} \ --dict ${DICT} \ --trim_fastq true \ --save_trimmed true \ --save_reference \ --save_mapped true \ --save_output_as_bam true \ -resume

Note, the pfr profile only specifies SLURM as the executor.

pfr_params.jason:

{ "genome": null, "igenomes_ignore": true, "save_reference": true, "split_fastq": 50000000, "trim_fastq": true, "save_trimmed": true, "aligner": "bwa-mem", "save_mapped": true, "save_output_as_bam": true }

Here is the error in the log file:

Jun-14 23:58:10.815 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm Jun-14 23:58:10.815 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm' Jun-14 23:58:10.831 [main] DEBUG nextflow.Session - Config process names validation disabled as requested Jun-14 23:58:10.833 [main] DEBUG nextflow.Session - Igniting dataflow network (184) Jun-14 23:58:10.844 [Actor Thread 5] ERROR nextflow.extension.OperatorImpl - @unknown java.lang.NullPointerException: Cannot get property 'baseName' on null object at org.codehaus.groovy.runtime.NullObject.getProperty(NullObject.java:60) at org.codehaus.groovy.runtime.InvokerHelper.getProperty(InvokerHelper.java:190) at org.codehaus.groovy.runtime.callsite.NullCallSite.getProperty(NullCallSite.java:46) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callGetProperty(AbstractCallSite.java:329) at Script_48a2d276$_runScript_closure1$_closure2$_closure6.doCall(Script_48a2d276:132) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:38) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:53) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139) at nextflow.extension.MapOp$_apply_closure1.doCall(MapOp.groovy:56) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035) at groovy.lang.Closure.call(Closure.java:412) at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120) at groovyx.gpars.dataflow.operator.DataflowOperatorActor.onMessage(DataflowOperatorActor.java:108) at groovyx.gpars.actor.impl.SDAClosure$1.call(SDAClosure.java:43) at groovyx.gpars.actor.AbstractLoopingActor.runEnhancedWithoutRepliesOnMessages(AbstractLoopingActor.java:293) at groovyx.gpars.actor.AbstractLoopingActor.access$400(AbstractLoopingActor.java:30) at groovyx.gpars.actor.AbstractLoopingActor$1.handleMessage(AbstractLoopingActor.java:93) at groovyx.gpars.util.AsyncMessagingCore.run(AsyncMessagingCore.java:132) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:833)

Here is the sample sheet:

` patient,status,sample,lane,fastq_1,fastq_2

GC,1,GC_CAT13ANXX_TTAGGC,L001,/WGS/AGRF_CAGRF14129_CAT13ANXX/DH_A_GC_CAT13ANXX_TTAGGC_L001_R1.fastq.gz,/WGS/AGRF_CAGRF14129_CAT13ANXX/DH_A_GC_CAT13ANXX_TTAGGC_L001_R2.fastq.gz GC,1,GC_CAT13ANXX_TTAGGC,L002,/WGS/AGRF_CAGRF14129_CAT13ANXX/DH_A_GC_CAT13ANXX_TTAGGC_L002_R1.fastq.gz,/WGS/AGRF_CAGRF14129_CAT13ANXX/DH_A_GC_CAT13ANXX_TTAGGC_L002_R2.fastq.gz GC,1,GC_CAT13ANXX_TTAGGC,L003,/WGS/AGRF_CAGRF14129_CAT13ANXX/DH_A_GC_CAT13ANXX_TTAGGC_L003_R1.fastq.gz,/WGS/AGRF_CAGRF14129_CAT13ANXX/DH_A_GC_CAT13ANXX_TTAGGC_L003_R2.fastq.gz SW,0,SW_CAT13ANXX_TGACCA,L001,/WGS/AGRF_CAGRF14129_CAT13ANXX/DH_B_SW_CAT13ANXX_TGACCA_L001_R1.fastq.gz,/WGS/AGRF_CAGRF14129_CAT13ANXX/DH_B_SW_CAT13ANXX_TGACCA_L001_R2.fastq.gz SW,0,SW_CAT13ANXX_TGACCA,L002,/WGS/AGRF_CAGRF14129_CAT13ANXX/DH_B_SW_CAT13ANXX_TGACCA_L002_R1.fastq.gz,/WGS/AGRF_CAGRF14129_CAT13ANXX/DH_B_SW_CAT13ANXX_TGACCA_L002_R2.fastq.gz SW,0,SW_CAT13ANXX_TGACCA,L003,/WGS/AGRF_CAGRF14129_CAT13ANXX/DH_B_SW_CAT13ANXX_TGACCA_L003_R1.fastq.gz,/WGS/AGRF_CAGRF14129_CAT13ANXX/DH_B_SW_CAT13ANXX_TGACCA_L003_R2.fastq.gz `

maxulysse commented 3 months ago

@charlesdavid can you send the file for the samplesheet?

kenibrewer commented 3 months ago

@charlesdavid Your sample sheet appears to have "two" patients. One with the id GC that is all tumor samples and other named SW that is all normal. The tools that you're trying to run requires samples to have the same patient id with at least some tumor and some normal.

@maxulysse It might be good to close this issue as the most recent comments are helping troubleshoot user issues. Leaving it in the open state makes it seem that there is still something that needs to be fixed with sarek. Perhaps it would be good to improve the clarity of that error message though as it still seems to cause confusion.

maxulysse commented 3 months ago

This issue has been fixed, @charlesdavid your issue is now #1567. Thanks @kenibrewer for the idea

bounlu commented 1 month ago

This bug emerged again after the latest merging of the dev version into master 3.4.3. This time it says sample-sheet only contains normal-samples:

The sample-sheet only contains normal-samples, but the following tools, which were requested with "--tools", expect at least one tumor-sample : controlfreec, mutect2
Missing process or function Channel.empty([[]])

 -- Check script '/home/omeran/.nextflow/assets/nf-core/sarek/main.nf' at line: 342 or see '.nextflow.log' file for more details

This was a resume from a previous successful run with the same command line. It worked then and now it fails again.

FriederikeHanssen commented 1 month ago

I am guessing false flag from the error message (it for some reason shows up for unrelated issues). What is in the .nextflow.log & samplesheet

bounlu commented 1 month ago

What should be the correct format of the full path to the caches? There is also discrepancy between the documentation help text and the actual cache files located on igenomes S3:

https://nf-co.re/sarek/3.4.3/parameters/#vep_cache says: ${vep_species}/${vepgenome}${vep_cache_version}

S3 has: s3://annotation-cache/vep_cache/111_GRCh38/homo_sapiens/111_GRCh38/

I am using a local cache and it seems I need to update the path format to make it work with the 3.4.3.

asp8200 commented 1 month ago

I think I also encountered that problem. Fixed it locally with some symlinks.

bounlu commented 1 month ago

@asp8200

Can you please share your directory tree for the cache dirs?

asp8200 commented 1 month ago

We already had vep_cache/110_GRCh38/homo_sapiens/110_GRCh38 downloaded from previous (old) versions of Sarek, and I had to introduce a symlink from vep_cache/homo_sapiens/110_GRCh38 to vep_cache/110_GRCh38/homo_sapiens/110_GRCh38 for Sarek v3.4.2. (I haven't tried with Sarek v3.4.3 yet.)

I suspect that @maxulysse changed the folder structure for the vep-cache (and snpeff-cache), but perhaps he can comment on that?