Open cjnolet opened 3 years ago
I would really love this feature. Actually, I am wondering if as a first step, we could allow Python API to return the dendrogram tree data structure (including the linkage value). I tried to dig into the code a bit and I imagine https://github.com/rapidsai/raft/blob/75656cee48b544caf609555f838eac39e68e3438/cpp/include/raft/sparse/hierarchy/detail/agglomerative.cuh#L120 holds the distance value that could be used as cut off?
I also see that children
(https://github.com/rapidsai/raft/blob/75656cee48b544caf609555f838eac39e68e3438/cpp/include/raft/sparse/hierarchy/detail/agglomerative.cuh#L142) gets returned to the Python API, however, it would be great to know what's the (2, num_rows) children array's elements represents? https://github.com/rapidsai/cuml/blob/dd7cbf45c1d089ece7db0f1610d3cab775f3de02/python/cuml/cluster/agglomerative.pyx#L200 To me it seems that it's not the index of the rows of the dataset, but I imagine this would be a binary tree structure.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Sorry to ask. Anyone slove this problem yet? I am having the same problem.
This would be great to have
Since the dendrogram is a binary tree, the current implementation of
AgglomerativeClustering
cuts the dendrogram at a particular level based on a user-provided parametern_clusters
. This can be useful when the user knows the number of clusters but makes it challenging in cases where the user might instead know a distance threshold and not the resulting number of clusters.Supporting the
distance_threshold
parameter shouldn't be too hard. Rather than slicing the dendrogram at a particular level, clusters that fall below a particular distance threshold from each other are merged together to yield a final set of flattened clusters.