Open alexgcsa opened 3 years ago
Hi Alex, you can take a look at this page where they go over how hdbscan works: https://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html
Based on the paper as well as in the page above, one can say the algorithm is deterministic.
I believe that the results might look different due to the different labelling order for the clusters when you apply pairwise distance.
If you read how the algorithm works : How HDBSCAN Works, there is a step for generating Minimum Spanning Tree, and I believe this might lead to non-deterministic behaviour, since a unique MST cannot be guaranteed for a graph with non-unique edge weights.
why when i add juste one point all the cluster change ? i try whithout add point, cluster stay the same but when i add juste one point all cluster change
I have had deterministic results with the following:
import numpy as np
np.random.seed(42)
For HDBSCAN, set gen_min_span_tree=False
and approx_min_span_tree=False
Hi there,
I was wondering if HDBSCAN is deterministic or not. If its behavior is not deterministic, it would be relevant to add a random seed to initialize and control the generation of pseudo-random numbers during its proces.
Could you clarify it?
Cheers,
Alex de Sá