Open peekxc opened 8 years ago
There are actually potentially a lot of non-unqiue distances, so the MST is certainly not guaranteed unique. The different MSTs should provide equivalent single linkage trees however (I would have to sit down and work through the details to be sure, but I believe this is the case), and thus result in identical clusterings.
I've been reading over the paper to try to gain an understanding of the algorithm, but I've run into a bit of a problem. After exporting the mutual reachability distance matrix from HDBSCAN, running other MST implementations creates graphs that are not completely equivalent to the one produced by scikit here. After some investigation, I found that there were in fact non-unique distances in the reachability graph, which it seems prevents MST algorithms from arriving at a unique solution. How does HDBSCAN handle this?
@lmcinnes