Closed spock closed 4 years ago
My apologies - that wasn't very well explained in Trycycler's output! I have found Mash distances get somewhat unreliable with higher values, so I capped them at 0.25. I.e. any Mash distance over 0.25 essentially means 'not closely related'. Capping the distances helped make the trees a bit more manageable.
So the short answer is yes, this is normal/expected behaviour.
I'm a bit more concerned about the part where you said 'pages and pages of scrollback buffer'. In most cases, a nice input assembly for Trycycler will only have a few contigs. This is because Trycycler is really intended to work on completed genomes, and most bacterial genomes only have a few replicons (I think the most I've seen is ~10). If you have some input assemblies with lots of contigs, I would worry that they are fragmented and not really suitable for use as Trycycler input. So you might get cleaner results by doing a bit of manual curation on your input assemblies (e.g. tossing out assemblies that look fragmented) before running Trycycler cluster.
And I will definitely put some thought into making the Trycycler cluster output less confusing - thanks!
hi Ryan, thanks for the explanation! Somehow didn't have the intuition to grep the code for that value :)
You are correct, this particular assembly is troublesome. Moreover, it's not even bacterial, it's a small fungus.
(I do realize that I should exercise the same level of caution with Trycycler as with Unicycler when applying to non-bacterial species, primarily because of circularization.)
Thanks for an interesting tool, Ryan!
This is my first attempt running
trycyler
on a sample/genome which (for some yet-unknown reason) ends up too fragmented even with decent (~50x) PacBio coverage. (A few other nearly-identical samples get assembled very well even at ~30x.)I've used
mash
a few times before for small-group and pairwise whole-genome comparisons, so I am surprised to see a particular output.At the stage of building a distance matrix with mash, I am seeing a peculiar pattern of repeated 0.250 values (wrapped for somewhat better readability):
This goes on and on, pages and pages of scrollback buffer :) , with occasional different values.
The actual question is: is this a normal/expected behavior, or a bug in my local environment?
Resulting dendrograms look fine, with variable branch lengths and realistic-looking clustering.