Open adkinsrs opened 2 years ago
I just ran into a dataset that had so many data samples that Scipy ran into a "maximum recursion depth exceeded" error when attempting to cluster the samples, so being able to optionally cluster by an aggregated category would also alleviate this issue.
Hi @adkinsrs.
The reordering of the data is proceeding not in the Clustegram component directly, but in the Dendrogram class from the plotly.figure_factory module. So we don't available to fix the main problem of this issue in the dash-bio project. We can create an issue about the reordering problem in the original Dendrogram component from figure_factory.
Best wishes, Nick.
Currently the Dash clustergram is restricted to clustering based on all row or column values. There are cases where I would like to sort my data based on a chosen metadata category, and then cluster based on the mean value of that metadata category. Right now I am forced to choose to preserve sorting without clustering, or cluster by the raw data values and lose the aesthetic grouping that came from pre-sorting the data. Below I have two pictures of Dash-Bio Clustergrams (with my own post-processing touches) that show the situation I am trying to convey.
Clustering by individual samples instead of category
Sorted by a category but no clustering
The functionality I am requesting is similar to the dendrogram option for Scanpy's heatmap function (see https://scanpy.readthedocs.io/en/stable/generated/scanpy.pl.heatmap.html).
I thought a potential solution would be to
dashbio.Clustergram
on this to get the dendrogram traces backdashbio.Clustergram
using the sorted non-grouped original data.But I would be running the "clustergram" tool twice, and since the category groups have uneven counts of members, the traces from step 2 would not line up 1-to-1 with the sorted data and the x/y coords would need to be adjusted.
Any thoughts on this enhancement?