mikolmogorov / Flye

De novo assembler for single molecule sequencing reads using repeat graphs
Other
743 stars 164 forks source link

Missing & misassembled plasmids #706

Open peradastra opened 1 month ago

peradastra commented 1 month ago

Howdy,

My name is Per and I would first like to thank you for developing Flye! Anyway, I have been using Flye to assemble an E. Coli isolate from ONT data. I think that Flye is having 2 issues properly assembling plasmids from this isolate. I would greatly appreciate some help troubleshooting these issues!

Issue No. 1: Not assembling small (<~2 kb)) plasmids.

The final assembly info (Table 1) shows that Flye is assembling 2 plsamids and the main bacterial chromosome.

#seq_name length cov. circ. repeat mult. alt_group graph_path
contig_1 5072618 137 Y N 1 * 1
contig_3 122045 205 Y N 2 * 3
contig_2 8128 871 Y Y 7 * 2
Table 1. Final assembly info for isolate.

However, based on NanoPlots (Fig 1) and gels produced from plasmid preps, we expect there to be 3 smaller plasmids.

newplot

Fig 1. NanoPlot of reads used as input for isolate assembly

Issue No. 2: Missassembly of a plasmid

While Flye assembled an 8 kb contig, there isn't a corresponding peak in the NanoPlot (Fig 1.). Further examination of the alignments profile of reads (Fig 2) and low number of supplemental alignments suggests to me that this is a repeat that has been duplicated in tandem and then circularized.

igv_snapshot

Fig 2. IGV screenshot of reads used for assembly mapped back to contig_2. I sorted alignments by tag (SA) with MAPQ set to 0.

There are a few reads that span this 8kb region (Fig 3) so I suspect that this is sequence duplication exists somewhere else in this E. coli genome but not as an 8 kb plasmid.

mapq_30_igv_snapshot

Fig 3. IGV screenshot of reads used for assembly mapped back to contig_2. I sorted alignments by length with MAPQ set to 30.

Logs

Here are logs for 2 seperate attempts to assembly this genome. I saw that you previously addressed this issue with another user by suggesting the use of the meta tag, so I tried that, but it did not help.

flye_meta.log flye.log

Best, Per

mikolmogorov commented 3 weeks ago

Hi Per,

Thanks for the detailed report! Indeed, this is a known issue that Flye sometimes may miss short plasmids or duplicate them. It is somewhat hard to come up with a general-purpose solution that will fix this in the existing framework. As suggested in the other thread, a specialized plasmid assembler (https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000631) may help.