Open peterjc opened 1 year ago
sourmash plot could certainly use some love! It was one of the first things we implemented ~6 years ago, and (FBFW) has driven a lot of our citations... but we haven't upgraded it, ever. This was due to some combination of:
This is all me saying that it's never risen to the level of "gotta fix" but has definitely risen to the level of "hmmmm yeah we should really be doing something about that."
A few related thoughts and issues -
sourmashconsumr https://github.com/sourmash-bio/sourmash/issues/2492 is an R package that has some nice viz:
per https://github.com/sourmash-bio/sourmash/issues/2406, I appear to have mixed up my similarity and distance matrices.
per https://github.com/sourmash-bio/sourmash/issues/2452, there are some good opportunities to make editing label names better (since I intuit that is a lot of what people want to do)
per https://github.com/sourmash-bio/sourmash/issues/2583 there are lots of opportunities to annotate dendrograms with more information
per https://github.com/sourmash-bio/sourmash/issues/1353 and https://github.com/sourmash-bio/sourmash/pull/2438 in particular it would now be straightforward to experiment with other clustering and viz techniques all from within the relative safety of the sourmash command line.
this would permit the addition of dependencies that we don't want to add to core sourmash (for size and/or platform/install and/or support reasons) to support better output viz.
this is all to say... we just need someone who cares, or at least pointers to some good plots from other packages that we can steal ;). I know this is an active area, I just don't have a starting point!
That all makes sense. One size fits all visualisation defaults are not easy.
additional thoughts -
more from slack:
Christopher Gulvik Fig 1c minimum spanning tree style in GrapeTree rocks by [@jcarrico] and [@happykhan] . I've grown to appreciate it more and more for a broader audience than heirclust or phytrees to show outbreak or cluster data (SNPs, ANI, or cgMLST). The software that currently makes that style here has end of life this year.
The betterplot
plugin would be a good place to add custom plotting code for very large plots.
Running
sourmash plot --pdf --labels example.npy
with ~200 signatures gives plots where the labels are too large and therefore overlap.Looking at https://github.com/sourmash-bio/sourmash/blob/latest/src/sourmash/fig.py it does not appear to alter the matplotlib default font sizes, but resources like https://stackoverflow.com/questions/3899980/how-to-change-the-font-size-on-a-matplotlib-plot suggests we might reduce the font size and/or increase the image size for larger datasets.
Is this a bug, or would your recommendation be to follow https://sourmash.readthedocs.io/en/latest/plotting-compare.html#Customizing-plots and customise the plot by writing a modified version of the
sourmash/fig.py
code?