Closed zeehio closed 3 months ago
Hi,
this issue should be fixed in the development version.
You can give it a try with nextflow run ... -r dev
. If it doesn't work, please let me know!
Hi @grst I have been able now to test the dev pipeline. Thanks for the update. Unfortunately I am still facing validation issues:
I am using a single end dataset, where there is a fastq_1
, but there is not a fastq_2
.
The input.csv file is similar to:
"sample","fastq_1","fastq_2","strandedness",...
"id1","/path/to/fastq/sample1.fastq.gz","","auto",...
Please note how the fastq_2
column contains empty values.
I'm getting an error validating the 'input' again:
ERROR ~ ERROR: Validation of 'input' file failed!
-- Check '.nextflow.log' file for details
The following errors have been detected:
* -- Entry 1: Missing required value: fastq_2
* -- Entry 2: Missing required value: fastq_2
Having an empty fastq_2
seems correct to me when I check the code at the master
branch. There, if the fastq_2
is empty then the single_end
variable is set to "1"
. You can see this below (specifically line 184, in the not fastq_2
):
However on the dev
branch, the input schema used for the fastq_2
validation must exist and can't be empty:
I'd like for the scrnaseq pipeline to accept an input file with a fastq_2
column filled with ""
(empty strings), since that's what is generated by the nf-core/fetchngs
pipeline when downloading datasets.
Thanks and sorry for the delay in the reply
Just for further ideas, it may be good to checkout the rnaseq pipeline:
The check is done on purpose. All protocols supported by this pipeline use paired end data, where R1 contains UMI/barcode and R2 the actual sequence.
What kind of single-cell data are you dealing with?
Description of feature
The
nf-co.re/rnaseq
pipeline accepts and ignores any extra column ininput.csv
that is not required by the pipeline. This is useful because I can reuse theinput.csv
or include additional information I want to use in downstream analyses, without having to generate a specificinput.csv
just for running the pipeline.This
scrnaseq
pipeline is much more strict, giving an error when any unknown column is found.I would rather for the
scrnaseq
pipeline to follow thernaseq
behaviour, following the robustness principle that one should "be conservative in what you send, be liberal in what you accept".Is there any specific reason why you are not as liberal accepting unknown columns in the
input.csv
file?Thanks!