Is Random Seed Support Available for Reproducibility in cuML HDBSCAN?

rapidsai / cuml

cuML - RAPIDS Machine Learning Library

https://docs.rapids.ai/api/cuml/stable/

Apache License 2.0

4.25k stars 534 forks source link

Is Random Seed Support Available for Reproducibility in cuML HDBSCAN? #6121

Open MohabGhobashy opened 4 weeks ago

MohabGhobashy commented 4 weeks ago

I am using the cuML implementation of HDBSCAN for clustering and would like to ensure reproducibility across multiple runs. Is there currently any support for setting a random seed (e.g., via a random_state parameter) in the HDBSCAN algorithm to make the results deterministic?

If not, is there any plan to introduce such a feature in future releases?

divyegala commented 2 weeks ago

@MohabGhobashy Could you please explain your use-case? How different are your results across different runs? Please provide a minimal reproducer also if you have one.

In general, it's hard to provide exact reproducibility in highly parallel environments.