Closed gokceneraslan closed 4 years ago
Thanks for your feedback.
This is due to the fact that LouvainHierarchy returns a dendrogram with equal heights. When you do a straight cut on such a dendrogram, you may get a larger number of clusters than expected. I've updated the documentation in the develop branch.
scikit-network version: 0.20.0 Python version: 3.8.3 Operating System: macOS 10.15
import numpy as np
from sknetwork.data import load_netset
from sknetwork.clustering import Louvain
from sknetwork.hierarchy import LouvainHierarchy, cut_straight
graph = load_netset("openflights")
adjacency = graph.adjacency
louvain = Louvain()
labels = louvain.fit_transform(adjacency)
len(set(labels))
Output: 35
louvain_hierarchy = LouvainHierarchy()
dendrogram = louvain_hierarchy.fit_transform(adjacency)
labels = cut_straight(dendrogram, n_clusters=10)
len(set(labels))
Output: 35
(depth 1)
labels = cut_straight(dendrogram, n_clusters=35)
len(set(labels))
Output: 35
(depth 1)
labels = cut_straight(dendrogram, n_clusters=36)
len(set(labels))
Output: 152
(depth 2)
np.unique(dendrogram[:, 2], return_counts=True)
Output: (array([0., 1., 2., 3.]), array([2616, 329, 117, 34]))
Thanks!
Description
When I fit a
LouvainHierarchy
and then try to get a clustering with e.g. 50 clusters (cut_straight(dendrogram, n_clusters=50)
), I am getting 81 clusters instead. Paris() works perfectly fine on the same dataset. See reproducible example below.What I Did
Output: