marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
644 stars 177 forks source link

some canu contigs are in reverse #2290

Closed majogalarion closed 4 months ago

majogalarion commented 4 months ago

Some contigs (usually included in the unassembled file) are in the reverse direction. Is this even possible? Or is this affected by the quality of the reads?

skoren commented 4 months ago

What do you mean reverse direction, with respect to what? As in reverse-complement or something else?

majogalarion commented 4 months ago

Sorry if confusing -- I meant reverse complement. Such that I have to get the contig's reverse complement to properly align it with an identical sequence. Thanks!

skoren commented 4 months ago

The orientation of a contig in any assembly is arbitrary, it will be random chance based on which read got selected to start the contig from. I'd expect identical read inputs would lead to identical outputs but even the same reads re-ordered could change orientation. I wouldn't expect 100% identical sequences though to be duplicated in the assembly, usually unassembled are low-support and/or short sequences that were not included in the primary paths. What is the identical sequence you're aligning to?

majogalarion commented 4 months ago

I was trying to align it to a viral RefSeq from GenBank. But thanks for your reply, it makes more sense now. Thanks!