marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

Looking for some help with interpreting/understanding Canu output #473

Closed TinaH10 closed 7 years ago

TinaH10 commented 7 years ago

Hello,

I got canu to work successfully, have all my outfiles, but now really have trouble understanding what it all means. I am also very new to all the terminology and how I can visualize such gfa files.

Can anybody provide some guidance into where I can start learning about the output files, formats, downstream programs to look at my assemblies?

Any help would be much appreciated.

brianwalenz commented 7 years ago

The output documentation is a little buried, and not entirely up to date, but documented near the top of http://canu.readthedocs.io/en/latest/quick-start.html

The difference between a unitig and a contig isn't mentioned there: A unitig will almost never be misassembled, while a contig can span repeats that could be misassembled. Unitigs are created by splitting contigs at any place there is even a hint of ambiguity - basically, any place a repeat isn't fully spanned by a read. This isn't to say contigs are full of errors, just that unitigs are the most conservatively assembled sequence.

Most people seem to be happy with just the contig fasta, and completely ignore all the other outputs.

For GFA visualization, https://rrwick.github.io/Bandage/.

TinaH10 commented 7 years ago

Thank you very much!