rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.14k stars 526 forks source link

[FEA] Investigate approximate nearest neighbors in UMAP/TSNE/HDBSCAN #3807

Open cjnolet opened 3 years ago

cjnolet commented 3 years ago

The computation of the all-nearest-neighbors KNN is the bottleneck in UMAP and TSNE and both of these algorithms can get away with using approximate neighbors as the number of points grows very large.

We should investigate using approximate nearest neighbors functions within UMAP and TSNE, for example using nn-descent from RAFT, and see how much we can improve overall performance. In general, the training performance of many approximate algorithms is proportional to the recall, so we look at both the speedup and the embedding quality and potentially look at sampling as the data size grows very large.

cjnolet commented 2 years ago

This should also be investigated in HDBSCAN.