marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
660 stars 179 forks source link

Very different results of plasmid assembly between canu and miniasm #724

Closed bioprojects closed 6 years ago

bioprojects commented 6 years ago

Dear developers,

Thank you for developing this great software. I found a very different results of plasmid assembly between canu and miniasm as shown in the figure 1) the size of the plasmid assembled by canu is > 2 times larger than that by miniasm, 2) the red gene in the figure appears in 3 regions in the plasmid assembled by canu while in 1 region in the plasmid assembled by miniasm.

For input of miniasm, I used the xxx..correctedReads.fasta.gz produced by canu.

Based on your previous comments, I have been using the following parameters to assemble bacteria carrying a plasmid: overlapper=mhap utgReAlign=true corOutCoverage=100 contigFilter="2 1000 1.0 1.0 2"

Could you let me know how to interpret and improve the discrepancy?

Thanks a lot in advance.

Best wishes,

Koji

======= Koji Yahara Senior Investigator Antimicrobial Resistance Research Center National Institute of Infectious Diseases 4-2-1 Aobacho, Higashimurayama, Tokyo 189-0002 Japan Tel: +81-42-202-6080

skoren commented 6 years ago

Canu will go around a circular plasmid so it is possible it will end up as 2x the length. I'm not sure if miniasm does this. Have you checked for self-similarity of the contig to itself, trimming the redundant ends and for read support across it?

bioprojects commented 6 years ago

Thank you very much for your comment. miniasm has the cirlator function. We have found that the contig has a large repeat region, which would have caused the problem. I understand that in such a case we have to manually check redundant ends and canu currently does not have a function to check it.

skoren commented 6 years ago

The redundant ends are by design. The gfa file in the latest releases gives the cigar string to enable trimming.