mikolmogorov / Flye

De novo assembler for single molecule sequencing reads using repeat graphs
Other
763 stars 165 forks source link

2 contigs created during assembly. Any suggestion to help assemble 1 whole contig by tweeking the options of flye? #482

Closed nthosar9846 closed 2 years ago

nthosar9846 commented 2 years ago

Hi @fenderglass ,

I have assembled one clinical isolate of Mycobacterium tuberculosis using flye which gave me a result of 2 contigs. One contig of length 3.83 MB and the other 0.57 MB. One of the contigs is duplicated and it could be the reason is the reason for the break. I want to know if there is a way using flye that I can get a single contig, by changing some parameters. I have also attached my assembly stats and log of the assembly.

assembly_info.txt flye.log

mikolmogorov commented 2 years ago

In general Flye does not require parameter tuning. A few specific cases are outlined in the FAQ. You can take a look on the assembly graph to see how configs are connected. If you have a large duplication in your genome, it may remain unresolved.

0xaf1f commented 2 years ago

I'm working with @nthosar9846 on this particular assembly and the graph path column in the assembly_info file mentions what seems to be a contig 1 that was not output by the assembler. We only got sequences for contig_2 and contig_3. Do you know why this is or how we could recover that sequence?

As for our 500kb duplication, are we just left with trying gap filling tools to try to resolve that if tweaking --min-overlap doesn't help?

#seq_name   length  cov.    circ.   repeat  mult.   alt_group   graph_path
contig_2    3837121 120 N   N   1   *   2,-1,-3
contig_3    574862  210 N   Y   2   *   3
mikolmogorov commented 2 years ago

@0xaf1f could you post the assembly graph visualization (with Bandage)?

Graph_path contains a path in the assembly graph. Contigs are constructed from at least one graph edge, but possibly multiple. In your case, contig 2 includes edges 2, 1 and 3 (and 1 and 3 are likely repetitive edges). Repetitive edges do not form separate contigs.

Flye only resolves repeats that are covered by reads in full, so a 500kb duplication will be unresolved.

nthosar9846 commented 2 years ago

Here is the assembly graph generated by bandage. The small box between the large contig and the small contig is edge1 but how do we get its sequence?

graph
mikolmogorov commented 2 years ago

@nthosar9846 yes, that looks like a large duplication. You can extra edge sequences from gfa.

iek commented 1 year ago

Hello, thank you so much for this amazing tool. I had a similar question in that I got the following contigs from Flye:

Screen Shot 2023-05-03 at 2 12 43 AM

It seems that these two contigs could easily be merged into one, so I was wondering if you had suggestions on how to do so? Note that my assembly has many of these contigs that seem like they can be merged further.

Thank you so much for your time and help.

mikolmogorov commented 1 year ago

@iek you can try using purge haplotigs https://bitbucket.org/mroachawri/purge_haplotigs/src/master/. Alternatively, you can try adding --no-alt-contigs option to Flye.