lmcinnes / umap

Uniform Manifold Approximation and Projection
BSD 3-Clause "New" or "Revised" License
7.41k stars 805 forks source link

KeyError when using metric="precomputed" #996

Open lkp411 opened 1 year ago

lkp411 commented 1 year ago

I am trying to use a precomputed square distance matrix when using the reducer and I am running into a strange KeyError: 'precomputed'

When I do the following:

reducer = umap.UMAP(n_neighbors=15, local_connectivity=1, n_components=2, metric="precomputed", random_state=0)
distance_matrix = np.random.rand(SIZE, SIZE)
output = reducer.fit_transform(distance_matrix)

Everything works fine.

But when I use Pytorch to create the matrix like so:

reducer = umap.UMAP(n_neighbors=15, local_connectivity=1, n_components=2, metric="precomputed", random_state=0)
distance_matrix = torch.randn(SIZE, SIZE).numpy()
output = reducer.fit_transform(distance_matrix)

I get a KeyError: 'precomputed'

What could be the potential cause of this? The memory layout of Pytorch tensors is exactly the same as that of numpy, and they are convertible to each other using the same allocated memory.

ggdna commented 8 months ago

np.rand samples from $\mathcal{U}(0,1)$ by default (all positive), while torch.randn samples from $\mathcal{N} (0, 1)$ by default, which will give you negative numbers. This kind of error gets thrown if there's negative values in the distance matrix.

See: https://github.com/lmcinnes/umap/issues/854#issuecomment-1959507992