Open alexhbnr opened 1 year ago
https://nf-co.re/mag/2.2.1/parameters#single_end isnt doing the job?
I think the point is more when you want to mix single-end sequenced libraries in the same run as libraries sequenced paired-end (and/or also possibly accounting for singletons)
Yes, I am sorry for not clarifying it more, @d4straub. We have encounter a number of samples for which we have both single-end and paired-end data that belong to the same sample. Therefore, we would like to be able to assemble them together without treating the different sequencing data types as separate samples and having to perform a co-assembly.
Ah I see, yes, thats a different approach. I oppose however to give short reads into a dedicated long read channel, that might make too much problems further down, confuse other developers and make the whole code less clear. Probably rather add a dedicated optional column for single ended short reads in the samplesheet? Or combine it with https://github.com/nf-core/mag/issues/358 to have multiple sequencing runs, including single and paired end libraries available?
Given for the the run merging suggested in #358 we would also have to change the samplesheet anyway, I think that while it would be 'more work', it would be more benefital to have a separate singletons
column
Some observations from the current code:
metaSPAdes
does not actually support single-end assembly, but we can include them as the 'orphaned' reads (via -s
to a paired library of some form (I'm not really sure if this is best practise but :shrug:); even if it cannot do single-end assembly alone
singletons = length(reads) > 2 ? "-s ${reads[2]}" : ""
megahit
does support single-end assembly, as it currently does in the pipeline. And as far as I can tell, also allows singletons with the pairs with -r
, so would need to update the condition
def input = params.single_end ? "-r \"" + reads1.join(",") + "\"" : "-1 \"" + reads1.join(",") + "\" -2 \"" + reads2.join(",") + "\""
to have all three if a reads3
is present
Some additional commets to yours, @jfy133:
you are right, metaSPAdes
does not allow to have only single-end data but at most allows for adding these type of data besides paired-end sequencing data. The logic that you implemented above makes sense to me. However, we might need to catch the exception that someone wants to use metaSPAdes
and doesn't provide paired-end data, in case this is not implemented yet
regarding the fastp
/adapterremoval
, do you need to associate the single-end library with a paired-end one after all? You could process each paired-end and single-end library separately and only merge them after the alignment step on the sample level. At this point, the sample ID is relevant but not the library ID. Or do I miss something here.
Some additional commets to yours, @jfy133:
- you are right,
metaSPAdes
does not allow to have only single-end data but at most allows for adding these type of data besides paired-end sequencing data. The logic that you implemented above makes sense to me. However, we might need to catch the exception that someone wants to usemetaSPAdes
and doesn't provide paired-end data, in case this is not implemented yet
:+1:
- regarding the
fastp
/adapterremoval
, do you need to associate the single-end library with a paired-end one after all? You could process each paired-end and single-end library separately and only merge them after the alignment step on the sample level. At this point, the sample ID is relevant but not the library ID. Or do I miss something here.
Uhhh good point. No I thikn you might be right... I think I had something in my head about the groups
, but then the groups can be associated anyway.
Description of feature
Due to the high fragmentation of ancient DNA data, there is not much gain in using long-read DNA sequencing methods for assembling ancient DNA samples. However, many ancient DNA samples have single-end short-read sequencing data and the two de novo assemblers, MEGAHIT and metaSPAdes, have the option to use these type of sequences as input.
Therefore it would be nice if one could provide this single-read short-read sequencing data in the column
long_reads
in the sample sheet to provide these and at the same time disable the long-read sequencing data steps.