rrwick / Trycycler

A tool for generating consensus long-read assemblies for bacterial genomes
GNU General Public License v3.0
306 stars 28 forks source link

Minimap2/miniasm produce unusual number of contigs #14

Closed SShivani22 closed 3 years ago

SShivani22 commented 3 years ago

Dear Ryan,

Thanks for developing the amazing tool. This is the second time I am using it and I am facing some issues this time. Please take a look: I used Nanopore to sequence E.coli. Flye and minimap2/miniasm was used to generate assemblies on the read subsets. However, Flye produces finally 4/5 contigs while minimap produces around 35 contigs(unitigs). This is so weird because last time I generated on another E.coli set I sequenced, both Flye and minimap produced similar contig numbers. Both the E.coli strains were sequenced at different times with different sequencing depths. Question:

171262291_3858435617576525_6234930909772664639_n

SShivani22 commented 3 years ago

Umm.. I tried Raven and I am getting around 35 contigs. image

rrwick commented 3 years ago

There are a number of reasons you could get a lot of contigs, but I see that in your case, the high contig counts are accompanied by a large assembly size (8-9 Mbp). This makes me strongly suspect that you have some sort of low-level contamination in your read set. I.e. there's your main genome, which is ~5 Mbp, and then another genome which is lower depth. It seems like Flye is filtering out the lower depth genome. Miniasm and Raven are including it, though probably in a fragmented state.

I see a few ways forward here:

  1. You could just use Flye assemblies. This is definitely the simplest approach! But it may not be as robust - e.g. if Flye keeps making the same mistake, that mistake could end up in Trycycler's consensus (see this question in Trycycler's FAQ).
  2. You could do some filtering/cleaning of your other assemblies before inputting them into Trycycler. Miniasm/Minipolish assemblies should have depth information you can use. Not sure about other assemblers though.
  3. Play with the --min_contig_depth option of trycycler cluster until it seems like you're only getting the high depth contigs. E.g. perhaps --min_contig_depth 0.25 would do the trick. Have a look at the contig depths (should be in the Trycycler cluster output) to see what a nice value is.

I'll close this issue now, but let me know if any of these do/don't work for you!

Ryan