nf-core / oncoanalyser

A comprehensive cancer DNA/RNA analysis and reporting pipeline
https://nf-co.re/oncoanalyser
MIT License
26 stars 5 forks source link

`samplesheet` format is not recognised #65

Closed bounlu closed 1 month ago

bounlu commented 1 month ago

Description of the bug

Input samplesheet format provided on the Documentation does not work for me. I tried to delete subject_id and sample_name individually and it still failed, but it worked when I deleted both for all samples.

Command used and terminal output

$ ./run_oncoanalyser.sh

 N E X T F L O W   ~  version 24.06.0-edge

Pulling nf-core/oncoanalyser ...
 Already-up-to-date
Launching `https://github.com/nf-core/oncoanalyser` [big_church] DSL2 - revision: 2f86f87702 [dev]

ERROR ~ missing 'library_id' info field for compass TUMOR/DNA

 -- Check '.nextflow.log' file for details
$ ./run_oncoanalyser.sh

 N E X T F L O W   ~  version 24.06.0-edge

Pulling nf-core/oncoanalyser ...
 Already-up-to-date
Launching `https://github.com/nf-core/oncoanalyser` [curious_maxwell] DSL2 - revision: 2f86f87702 [dev]

ERROR ~ got unexpected subject name for compass 220123565: 220324109

 -- Check '.nextflow.log' file for details
$ ./run_oncoanalyser.sh

 N E X T F L O W   ~  version 24.06.0-edge

Pulling nf-core/oncoanalyser ...
 Already-up-to-date
Launching `https://github.com/nf-core/oncoanalyser` [stoic_leakey] DSL2 - revision: 2f86f87702 [dev]

ERROR ~ got unexpected sample name for compass TUMOR/DNA: 220324109_3_umi

 -- Check '.nextflow.log' file for details
$ ./run_oncoanalyser.sh

 N E X T F L O W   ~  version 24.06.0-edge

Pulling nf-core/oncoanalyser ...
 Already-up-to-date
Launching `https://github.com/nf-core/oncoanalyser` [jolly_archimedes] DSL2 - revision: 2f86f87702 [dev]

ERROR ~ got unexpected sample name for compass TUMOR/DNA: 220324109

 -- Check '.nextflow.log' file for details
$ ./run_oncoanalyser.sh

 N E X T F L O W   ~  version 24.06.0-edge

Pulling nf-core/oncoanalyser ...
 Already-up-to-date

Relevant files

group_id,subject_id,sample_id,sample_type,sequence_type,filetype,info,filepath
my_project,220123565,220836103_3_umi,tumor,dna,fastq,library_id:12;lane:L02,/data/220836103_1.fq.gz;/data/220836103_2.fq.gz
my_project,220324109,220024109_3_umi,tumor,dna,fastq,library_id:13;lane:L02,/data/220024109_1.fq.gz;/data/220024109_2.fq.gz
my_project,220324466,220024466_3_umi,tumor,dna,fastq,library_id:14;lane:L02,/data/220024466_1.fq.gz;/data/220024466_2.fq.gz
my_project,220325489,220024489_2_umi,tumor,dna,fastq,library_id:15;lane:L02,/data/220024489_1.fq.gz;/data/220024489_2.fq.gz
my_project,220326755,220024755_2_umi,tumor,dna,fastq,library_id:16;lane:L02,/data/220024755_1.fq.gz;/data/220024755_2.fq.gz
my_project,220327052,220025052_2_umi,tumor,dna,fastq,library_id:17;lane:L02,/data/220025052_1.fq.gz;/data/220025052_2.fq.gz

System information

24.06.0-edge Server local Docker Linux dev

scwatts commented 1 month ago

Thanks for the report @bounlu. Can you confirm that samplesheet is definitely being provided to oncoanalyser in your execution script as --input /path/to/samplesheet.csv?

If you can provide the exact full command used to invoke oncoanalyser (e.g. nextflow run nf-core/oncoanalyser ...) along with the referenced inputs (samplesheet, config), I'll be able to help further.

bounlu commented 1 month ago

Yes the samplesheet is provided to the oncoanalyser because every time I change something in the samplesheet and I get a different error.

Here is the full command:

#!/bin/bash

nextflow run nf-core/oncoanalyser \
-latest \
-profile docker \
--mode 'targeted' \
--genome 'GRCh38_hmf' \
--panel 'tso500' \
--input '/home/github/nf-core/samplesheet_oncoanalyser.csv' \
--outdir '/data/nextflow/oncoanalyser/my_project/results/' \
-work-dir '/data/nextflow/oncoanalyser/my_project/work/' \
-c '/home/github/nf-core/custom_local.config' \
-r dev \
-resume

I already provided the samplesheet above and the config_local file has no issues as I use the same for all.

scwatts commented 1 month ago

Thanks for the extra info. I noticed that the error message above were referencing compass entries in the samplesheet that weren't present in the one provided. Putting that aside, I've now tested your samplesheet and can see what is going wrong.

The samplesheet isn't considered valid as there are multiple tumor DNA samples given for a single analysis group, which is determined by values in the group_id column and in this case is my_project.

Since all of your tumor DNA samples are singletons, you can fix your samplesheet by setting a unique group_id value for each, e.g.:

group_id,subject_id,sample_id,sample_type,sequence_type,filetype,info,filepath
220123565,220123565,220836103_3_umi,tumor,dna,fastq,library_id:12;lane:L02,/data/220836103_1.fq.gz;/data/220836103_2.fq.gz
220324109,220324109,220024109_3_umi,tumor,dna,fastq,library_id:13;lane:L02,/data/220024109_1.fq.gz;/data/220024109_2.fq.gz
220324466,220324466,220024466_3_umi,tumor,dna,fastq,library_id:14;lane:L02,/data/220024466_1.fq.gz;/data/220024466_2.fq.gz
220325489,220325489,220024489_2_umi,tumor,dna,fastq,library_id:15;lane:L02,/data/220024489_1.fq.gz;/data/220024489_2.fq.gz
220326755,220326755,220024755_2_umi,tumor,dna,fastq,library_id:16;lane:L02,/data/220024755_1.fq.gz;/data/220024755_2.fq.gz
220327052,220327052,220025052_2_umi,tumor,dna,fastq,library_id:17;lane:L02,/data/220025052_1.fq.gz;/data/220025052_2.fq.gz

I also see that the entry with subject_id of 220123565 has a different sample_id pattern compared to the others, not sure whether this is intentional but figure it's worth pointing out just in case.

Input samplesheet format provided on the Documentation does not work for me

Are you finding that the exact samplesheet given in the documentation isn't working or that you hadn't been able to use it as a template successfully to create your own?

bounlu commented 1 month ago

Thanks a lot for the quick reply. I intentionally changed the sample ids and names to disambiguate the information hence the naming irregularities you observed.

I think what I needed is this:

The samplesheet isn't considered valid as there are multiple tumor DNA samples given for a single analysis group, which is determined by values in the group_id column and in this case is my_project.

This explains the error I got, I will try to assign unique group id per sample.

Thanks for the help.

scwatts commented 1 month ago

No worries, and to complete the explanation regarding grouping - in other cases you may want multiple samples to be part of the same analysis group e.g. a WGS tumor/normal pair must be provided under the same group_id value otherwise they'd be treated separately as individual tumor-only and normal-only samples.

scwatts commented 1 month ago

Closing this as resolved, please reopen if needed