rrwick / Trycycler

A tool for generating consensus long-read assemblies for bacterial genomes
GNU General Public License v3.0
306 stars 28 forks source link

Strange clustering/a lot of clusters #43

Closed termithorbor closed 1 year ago

termithorbor commented 2 years ago

Hi,

how would you go on in a case linke this when quite a lot of clusters are found?

image

rrwick commented 2 years ago

Oof - that's a real mess! My first thought, based on the contig lengths, is that your input assemblies are not complete. This is a requirement of Trycycler, i.e. you can't use draft assemblies as input. What species is this?

I also find the branch lengths in your screenshot to be odd. It looks like a cladogram where branch lengths are ignored, but branch lengths are important for interpreting the cluster tree. What tool are you using to view the tree? I usually use FigTree.

Ryan

derekstein commented 1 year ago

I am having the same issue, my genome should be 1.1m. If my genome is expected to be 1.1m should my contig lengths also be 1.1m? Or is it normal to have shorter contig lengths? Also not sure what you mean by draft assemblies? I assume following the Trycycler wiki would lead to full assemblies (not draft asssemblies?)

rrwick commented 1 year ago

You should expect contig lengths that mostly match the lengths of your replicons. For example, if your genome has a 1.1 Mbp chromosome and a 60 kbp plasmid, contigs would ideally either be 1.1 Mbp or 60 kbp. It's normal if some contigs don't fit (e.g. fragmentation, poor circularisation, contamination, etc). But if lots/most of your contigs don't fit, that suggests a bigger problem (e.g. read length is too short for complete assembly).

Ryan