Closed jfy133 closed 2 years ago
I would use as the input format what is spit out by https://github.com/nf-core/fetchngs so it generally has the same columns but different headers for it. I would drop the format column. If needed, that can be figured out from the filenames.
OK actually I agree, that's what I actually based this off of. Do you have an example of a fetchngs sheet?
I ran it once and the only samplesheet I got was filled with millions of columns which I didn't like
Nevermind, I saw this:
--nf_core_pipeline [string] Name of supported nf-core pipeline e.g. 'rnaseq'. A samplesheet for direct use with the pipeline will be created with
the appropriate columns.
so we can customise it I guess
Yeah, it adds a lot of columns but we can pick the ones we need. I do think it's nice, though, if the pipeline keeps all input columns. This makes it easier for users to add any kind of meta information that they would like. The minimal information, in my opinion, is:
sample,fastq_1,fastq_2
@jfy133 I think this is csv and not tsv
CSV seems to be the standard in nf-core pipelines. In Python it's quite easy to allow both but that's harder in nextflow I think.
not at all, you have the splitCsv
operator: https://www.nextflow.io/docs/latest/operator.html#splitcsv
Yes, but it cannot "sniff" if it's CSV or TSV by itself, so you either need to hard code it, look at the file extension, or let the user determine it.
Oh I see what you mean, then yes you're right. And as you said, csv is the standard in DSL2 nf-core pipelines.
Sorry yup - eager is TSV :sweat_smile:
Sarek was TSV too, we're now csv
Don't abanon me!
Back on topic:
accept: fastq, fq, fasta, fna. fa + all with .gz
@maxibor and I decided to go for an explicit .fasta
column as this means fastq_1
and fastq_2
can be taken directly from fetchNGS
should change platform to specific machine, as we need 2/4 colour chemsity info
should change platform to specific machine, as we need 2/4 colour chemsity info
Can you provide some more context, please, why this is needed?
@maxibor did you add a check that if you can't supply FASTA and FASTQ in the same line?
I think this is set for now, can reopen if more issues crop up
Description of feature
We will need to support both input (paired/single) FASTQ and also FASTA files, as the latter seems common.
I propose something like this:
i.e.
Check
instrument_platform
against: https://www.ebi.ac.uk/ena/portal/api/controlledVocab?field=instrument_platform