rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.23k stars 532 forks source link

[FEA] Deterministic spectral clustering #1915

Closed cjnolet closed 3 years ago

cjnolet commented 4 years ago

Have discussed this with @afender, and this should be available soon, allowing us to either set the initial embeddings or pass in a seed.

Once this is available in cuGraph/RAFT, we will be able to use it in UMAP for fully consistent embeddings even w/ the spectral clustering initialization strategy. In the meantime, the best guaranteed fully-consistent strategy to recommend to our users will be to use init="random" to guarantee full end-to-end consistency. Users can still use init="spectral" and achieve some level of consistency, but the values might be off by as much as 1e0

viclafargue commented 4 years ago

The joined script demonstrates inconsistencies with parameter init="spectral" and can be used to reproduce errors for further development of the current version of UMAP spectral clustering initialization before availibility of the cuGraph algorithm : script.py.

The script compares the final UMAP embedding of two equivalent UMAP models run (same parameters used). It appears that the inconsistencies surprisingly disappear when the second models is instantiated over the first one. Also, it seems that only transform datasets with more than 32 samples display inconsistencies.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

trivialfis commented 3 years ago

Hi, what's the current status of this issue? I think the spectral clustering algorithm is actually deterministic?

viclafargue commented 3 years ago

Yes, I believe the spectral clustering from cuGraph is now deterministic. We should probably close this issue. Can you confirm this @cjnolet?

cjnolet commented 3 years ago

I was able to reproduce non-determinisms during the 0.19 release for one of the CUDA versions (I thought it was 11.2?) but I was unable to reproduce them in 11.2 for the 21.06. In the event the issue was really happening in the CUDA 10.x series, RAPIDS has dropped official support for 10.x going forward anyways so I think we can close this for now.