rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.18k stars 527 forks source link

[BUG] NearestNeighbors search with euclidean metric and two_pass_precision does not compute square root distances. #5788

Open rtubelleza opened 7 months ago

rtubelleza commented 7 months ago

Describe the bug The euclidean distances returned by NearestNeighbors do not seem to be square rooted when two_pass_precision=True

Steps/Code to reproduce bug

# Need to reproduce bug below
import cuml
import cupy

# Log package versions
print(cuml.__version__)
print(cupy.__version__)

X, _ = blobs = cuml.datasets.make_blobs(
    n_samples=100, n_features=2, 
    centers=None, 
    cluster_std=1.0, 
    center_box=(-10.0, 10.0), 
    shuffle=True, 
    random_state=42, 
    return_centers=False, 
    order='F', dtype='float32')

for tpp in [True, False]:
    print(f"\nTwo Pass Precision: {tpp}")
    # d**2
    squared = cuml.neighbors.NearestNeighbors(
        n_neighbors=15, 
        algorithm="brute", 
        metric="sqeuclidean", 
        output_type="cupy"
        )
    squared.fit(X)
    d_sq, idx_sq = squared.kneighbors(X, two_pass_precision=tpp)

    # d 
    root = cuml.neighbors.NearestNeighbors(
        n_neighbors=15, 
        algorithm="brute", 
        metric="euclidean", 
        output_type="cupy"
        )
    root.fit(X)
    d_r, idx_r = root.kneighbors(X, two_pass_precision=tpp)

    # Log results
    print(cupy.all(d_sq == d_r))
    print(cupy.all(idx_sq == idx_r))
    # Manually calculate the euclidean distance between the first sample, and its first neighbor
    print(f"d2 = {((X[0] - X[idx_sq[0, 1].item()])**2).sum()}")
    print(f"d = {cp.sqrt(((X[0] - X[idx_sq[0, 1].item()])**2).sum())}")
    print(d_sq[0, 1])
    print(d_r[0, 1])

Outputs:

24.02.00
13.0.0

Two Pass Precision: True
True
True
d2 = 0.2593870162963867
d = 0.5093005299568176
0.25938702
0.25938702

Two Pass Precision: False
False
True
d2 = 0.2593870162963867
d = 0.5093005299568176
0.25938416
0.5092977

Expected behavior Expect differences between distances when metric is "euclidean" and "sqeuclidean". Expect distances in euclidean space to be square rooted, but matches outputs of sqeuclidean. Only occurs when two_pass_precision = True.

dantegd commented 6 months ago

Thanks for the issue @rtubelleza, this looks like a bug in the code, we'll look into it and try to solve it as soon as possible.