rrwick / Trycycler

A tool for generating consensus long-read assemblies for bacterial genomes
GNU General Public License v3.0
306 stars 28 forks source link

Too Many Clusters #50

Closed derekstein closed 1 year ago

derekstein commented 1 year ago

How many clusters should I be generating? Is ~160 clusters too many? No matter how much QC I do my tree looks a lot like the final example and I can't seem to find a single cluster that fits the length of my genome (for reference it's 1.1m). None of my clusters seem to be 1.1m bp, they tend to be 100k-300k bp

rrwick commented 1 year ago

Ideally, you have just one cluster per replicon. Often there are a few more clusters that need to be discarded. But yes, I'm afraid that 160 clusters is way too many.

Without seeing your data, my best guess is that your read set is simply insufficient to assemble the genome well, e.g. reads are too short or depth is too shallow. A lot of heterogeneity in the read set could also cause problems. If you really need a nice completed genome, I suspect you'll need to re-sequence 😢

Ryan