rrwick / Trycycler

A tool for generating consensus long-read assemblies for bacterial genomes
GNU General Public License v3.0
306 stars 28 forks source link

How to set --genome_size option for two bacteria sequenced (PacBio) together? #20

Closed clf-bio closed 3 years ago

clf-bio commented 3 years ago

I had two bacteria sequenced together (PacBio). The genome size is 4.6m and 4.9m, respectively. Should I set --genome_size to 9.5m (the sum or their genome size) or separately (4.6m and 4.9m) during generating assemblies process, or just leave it to miniasm to get the size? And is there any other special option I should use in the following Trycycler process?

rrwick commented 3 years ago

Assuming your two genomes are reasonably balanced in depth, then either strategy (providing 9.5m or letting miniasm do it automatically) should work fine.

If your genomes are not balanced (e.g. one has 30x depth and the other has 200x depth), then it could be trickier because you might end up with too few reads for the lower-abundance genome.

If you're seeing both genomes show up as expected in your clustering, that's great! If not, you might want to experiment with the --genome_size option. Trycycler only uses genome size for determining read depth, so --genome_size 20m would make Trycycler think the read set is shallower than it really is. It would then include more reads in each subset which might help with assembling the lower-abundance genome.

Good luck and let me know if you run into more trouble with this one!

Ryan