Open ravi-bhadauria opened 10 months ago
Hi @ravi-bhadauria , thanks for filing this issue. Apologies for the slow response here. I've reproduced this with the current stable v24.10 release, too.
sklearn trust: 0.9595029245283019
sklearn trust with more neighbors: 0.5375494968553459
trust without graph: 0.9602333333333334
cc @dantegd @cjnolet , do you know what might be going on here?
Describe the bug
cuML UMAP's implementation allows to use precomputed graphs. We can precompute the KNN graph with more neighbors (k_neighbors > n_neighbors), and then choose to experiment with different values of n_neighbors in cuML UMAP while doing grid search or similar. This is something that is advertised in the APIs too. Choosing a value of n_neighbors less than k_neighbors results in a catastrophic drop in the trustworthiness metric.
Steps/Code to reproduce bug A minimal working example with synthetic dataset
Expected behavior As long as NN neighbors > n_neighbors, trustworthiness should be invariant under numerical precision.
Environment details (please complete the following information):
nvcc --version
]conda list
outputAdditional context Add any other context about the problem here.