Closed tsp-kucbd closed 2 years ago
It seems that the repeat rows are getting dropped from the dendrogram. I don't know whether that is expected behavior from scipy and something seaborn needs to account for, or an upstream bug.
Interesting ...
A temporary - quick and very dirty - fix which works for my cases, is to make slight changes to the rows to prevent duplicate droppings (dependent of the min values in the dataframe)
df = (df + np.random.randint(1,10,size=df.shape)/100) if df.duplicated().any() else df
OK I have an answer to this. The duplicate rows have a distance of 0 from each other, so the connection between them is drawn at a height of 0 (or equivalently no spacing from the right edge of the plot). With thin lines they are hidden, but you can see the horizontal connection if you thicken the dendrogram lines (using tree_kws
):
With some dataframes we encounter strange behaviours of the clustermap, where a dendrogram on one axis does not align with the heat map cells as it has too few branches.
An example:
In the resulting figure one can see that the dendogram leaves (blue) are not aligned with the heatmap cells (red grid), and that the dendrogram has only 13 leaves whereas the dataframe has 18 rows. Any ideas why?
Versions Seaborn 0.11.1 Scipy 1.6.3 Pandas 1.2.4