Open MohabGhobashy opened 4 weeks ago
@MohabGhobashy Could you please explain your use-case? How different are your results across different runs? Please provide a minimal reproducer also if you have one.
In general, it's hard to provide exact reproducibility in highly parallel environments.
I am using the cuML implementation of HDBSCAN for clustering and would like to ensure reproducibility across multiple runs. Is there currently any support for setting a random seed (e.g., via a random_state parameter) in the HDBSCAN algorithm to make the results deterministic?
If not, is there any plan to introduce such a feature in future releases?