Closed cjnolet closed 3 years ago
The joined script demonstrates inconsistencies with parameter init="spectral"
and can be used to reproduce errors for further development of the current version of UMAP spectral clustering initialization before availibility of the cuGraph algorithm : script.py.
The script compares the final UMAP embedding of two equivalent UMAP models run (same parameters used). It appears that the inconsistencies surprisingly disappear when the second models is instantiated over the first one. Also, it seems that only transform datasets with more than 32 samples display inconsistencies.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
Hi, what's the current status of this issue? I think the spectral clustering algorithm is actually deterministic?
Yes, I believe the spectral clustering from cuGraph is now deterministic. We should probably close this issue. Can you confirm this @cjnolet?
I was able to reproduce non-determinisms during the 0.19 release for one of the CUDA versions (I thought it was 11.2?) but I was unable to reproduce them in 11.2 for the 21.06. In the event the issue was really happening in the CUDA 10.x series, RAPIDS has dropped official support for 10.x going forward anyways so I think we can close this for now.
Have discussed this with @afender, and this should be available soon, allowing us to either set the initial embeddings or pass in a seed.
Once this is available in cuGraph/RAFT, we will be able to use it in UMAP for fully consistent embeddings even w/ the spectral clustering initialization strategy. In the meantime, the best guaranteed fully-consistent strategy to recommend to our users will be to use
init="random"
to guarantee full end-to-end consistency. Users can still useinit="spectral"
and achieve some level of consistency, but the values might be off by as much as1e0