mikolmogorov / Flye

De novo assembler for single molecule sequencing reads using repeat graphs
Other
789 stars 168 forks source link

How does Flye scaffold contigs? #739

Open weishwu opened 1 week ago

weishwu commented 1 week ago

Sorry for my naive questions. I ran Flye on a set of ONT reads twice: (1) without --scaffold; and (2) with --scaffold. Run 1 produced a single ~55kb contig, and run 2 produced a single ~65kb contig. There is no gap at all in the run 2 assembly. I wonder why Flye is able to extend the assembly by ~10kb with --scaffold turned on. Is this because scaffolding has less stringency about coverage?

Below are the disjointigs from run 1:

disjointig_3    66480   14  60  61
disjointig_4    95174   67616   60  61

Disjointigs from run 2:

disjointig_15   64427   15  60  61
disjointig_3    66473   65530   60  61
disjointig_4    62096   133125  60  61
disjointig_6    76056   196270  60  61
mikolmogorov commented 1 week ago

Disjointigs should not be affected by --scaffold option, but you may just have some indeterminism between runs. Flye scaffolds based on the assembly graph struture, and may order contigs if there is unambigious path between them in the graph.

weishwu commented 1 week ago

@mikolmogorov Thanks for the answers. I thought when two contigs were put together into one scaffold, there should be always a gap in between. However, in the assembly I got from run (2) there was no gap.