rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.26k stars 535 forks source link

[BUG] simplicial_set_embedding doesn't work #6041

Closed Intron7 closed 2 months ago

Intron7 commented 3 months ago

Describe the bug simplicial_set_embedding doesn't work. It returns a straight line [ 16.925232 , 16.925232 ],[-127.48358 , -127.48358 ],...

Steps/Code to reproduce bug I basically used the test that you use get the bug

import platform
import numpy as np
import cupy as cp
from cuml.manifold.umap import (
    simplicial_set_embedding as cu_simplicial_set_embedding,
)
from cuml.manifold.umap import fuzzy_simplicial_set as cu_fuzzy_simplicial_set
from cuml.neighbors import NearestNeighbors
from cuml.manifold.umap import UMAP
from cuml.datasets import make_blobs

from umap.umap_ import (
    simplicial_set_embedding as ref_simplicial_set_embedding,
)
from umap.umap_ import fuzzy_simplicial_set as ref_fuzzy_simplicial_set
import umap.distances as dist

n_components = 2
n_rows = 8000
n_features = 50
n_neighbors = 15
n_clusters = 30
random_state = 42
metric = "euclidean"
initial_alpha = 1.0
a, b = UMAP.find_ab_params(1.0, 0.1)
gamma = 0
negative_sample_rate = 5
n_epochs = 500
init = "random"
metric = "euclidean"
metric_kwds = {}
densmap = False
densmap_kwds = {}
output_dens = False
output_metric = "euclidean"
output_metric_kwds = {}

X, _ = make_blobs(
    n_samples=n_rows,
    centers=n_clusters,
    n_features=n_features,
    random_state=random_state,
)
X = X.get()

cu_fss_graph = cu_fuzzy_simplicial_set(
    X, n_neighbors, random_state, metric
)
ref_fss_graph = cu_fss_graph.get()

ref_embedding = ref_simplicial_set_embedding(
    X,
    ref_fss_graph,
    n_components,
    initial_alpha,
    a,
    b,
    gamma,
    negative_sample_rate,
    n_epochs,
    init,
    np.random.RandomState(random_state),
    dist.named_distances_with_gradients[metric],
    metric_kwds,
    densmap,
    densmap_kwds,
    output_dens,
    output_metric=output_metric,
    output_metric_kwds=output_metric_kwds,
)[0]

cu_embedding = cu_simplicial_set_embedding(
    X,
    cu_fss_graph,
    n_components,
    initial_alpha,
    a,
    b,
    gamma,
    negative_sample_rate,
    n_epochs,
    init,
    random_state,
    metric,
    metric_kwds,
    output_metric=output_metric,
    output_metric_kwds=output_metric_kwds,
)

ref_embedding = cp.array(ref_embedding)

Expected behavior Give me not a straight line but a 2D embedding. Like CPU-umap

Environment details (please complete the following information):

viclafargue commented 3 months ago

Thanks for noticing this and creating an issue. I could debug the function and it appeared that there were a number of issues. Hope that we can merge this quickly so that it would be part of the nightly packages soon.

divyegala commented 2 months ago

Closed by https://github.com/rapidsai/cuml/pull/6043