samplesheet should have a suffix .tsv if it is a tab-separated file as described

nf-core / bacass

Simple bacterial assembly and annotation pipeline

https://nf-co.re/bacass

MIT License

60 stars 41 forks source link

samplesheet should have a suffix .tsv if it is a tab-separated file as described #64

Closed antunderwood closed 3 months ago

antunderwood commented 3 years ago

Description of the bug

The documentation describes that the input sample sheet is a tab-separated file but it labelled csv in the example. The pipeline fails if the suffix is .tsv

I suggest changing the example and the pattern match to ^\S+\.tsv$

d4straub commented 3 years ago

Agreed, I'll put it on the list for the next release!

d4straub commented 3 years ago

I was made aware that in https://nf-co.re/bacass/2.0.0/usage#samplesheet it is not specified that it must be tab-separated (but the example is fine), that needs changing as well.

mtva0001 commented 2 years ago

Hi,

We are trying to run bacass pipeline 2.0.0 but it doesn't work ("argument of file function cannot be null" - I have no clue what it means). I'm pretty sure there is something wrong with the input samplesheet and that's how I found this conversation. However, I'm still confused which file type we should use, csv or tsv? When we use .tsv, we get this error: --input: string [ControlDSshorttab.tsv] does not match pattern ^\S+.csv$ (ControlDSshorttab.tsv)

Also, is it okay to list multiple fastq.gz files belonging to the same sample, or should we cat them before running the pipeline?

Many thanks for your answer and help in advance!

d4straub commented 2 years ago

Hi,

here hopefully helpful answers:

I'm pretty sure there is something wrong with the input samplesheet and that's how I found this conversation. However, I'm still confused which file type we should use, csv or tsv?

tab-separated file with .csv as suffix, e.g. samplesheet.csv <- click so you can see the example

When we use .tsv, we get this error: --input: string [ControlDSshorttab.tsv] does not match pattern ^\S+.csv$ (ControlDSshorttab.tsv)

Well, does not match pattern ^\S+.csv$ means the pipeline expects a file (technically here a string of characters ^\S+) ending ($) with a .csv, I understand that this is expressed in technical terms and it is be not generally understandable.

Also, is it okay to list multiple fastq.gz files belonging to the same sample, or should we cat them before running the pipeline?

cat them before, the pipeline does not support multiple files per entry.

mtva0001 commented 2 years ago

Thanks for your quick help!

Now we have a different error message: "Read 1 FastQ file does not exist!". Of course, we don't have because we have Nanopore sequencing data, so we filled the R1 and R2 column with NAs. We only specified the LongFastQ column values. NAs used for GenomeSize and the Fast5 columns, too.

We even tried to specify the command --singleEnd, but still resulted in the same error message.

This is the command we try to run: nextflow run nf-core/bacass --input samplesheet.csv -profile uppmax --project snicXXXX -bg --assembly_type 'long' --assembler 'miniasm' --skip_kraken2

d4straub commented 2 years ago

Could you check whether your entry is exactly NA, no spaces, or something else additionally? You could also upload here the csv file if you'd like.

mtva0001 commented 2 years ago

Oh yes, the "hidden" extra spaces were the issues... sorry and thanks!

Daniel-VM commented 3 months ago

Closing this, please open it again if considered.