HDBSCAN for graph data - Githubissues

scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.

BSD 3-Clause "New" or "Revised" License

2.8k stars 501 forks source link

You could reach into the code a little bit and write some routines that will do this. If you looke at the HDBSCAN generic code you can see the heart of the generic case, and realise it isn't really that much code. You can write your own function and replace the min spanning tree computation with your own for a graph, and likely get what you want. The main concern in how best to replace the core distance notion for the graph, but that's probably a calculation you can make. From there you just need to note that the core hdbscan function is mostly just a lot of case handling to take different algorithmic paths based on the in put data; and then a call the _tree_to_labels with the requisite inputs. So wiring that together with your new function to handle graphs directly would not be hard.

If you are feeling ambitious you could even handle the type detection and call your own routine if the input X is an igraph object, and provide a PR to add graph clustering.

scikit-learn-contrib / hdbscan

HDBSCAN for graph data #533