Open eterna2 opened 2 years ago
i have tested with numpy=1.20.3
and it works normally.
It looks like an issue somewhere in the interactions of numba, numpy and (presumably) numpy's addition of type signature information which is fairly new. I'm not sure there is an easy fix for this, as it is in interactions of upstream libraries, so it is going to take me a while to figure out how to make it work. In the meantime hopefully older versions of numpy work for now. I'll see if I can figure something out though.
Digging in a little more; currently numba does not support numpy >= 1.21, so things are potentially just going to break. It seems highly likely they will fix that in future, but I have no idea of timelines. The interplay between getting hdbscan (with it's Cython compilation) working with numpy is hairy and frustrating. I'm not sure I have any good immediate work-arounds.
So I have a workaround that may get you past this particular issue. It's not pretty, but it should do the job. In umap/distances.py
there is a function definition:
@numba.vectorize(fastmath=True)
def correct_alternative_cosine(d):
return 1.0 - pow(2.0, -d)
If you change that to
@numba.njit(fastmath=True)
def correct_alternative_cosine(ds):
result = np.empty_like(ds)
for i in range(ds.shape[0]):
result[i] = 1.0 - np.power(2.0, ds[i])
return result
Then this should avoid the issue -- it seems specifically related to the numba.vectorize
. Potentially you can just make this edit in your installed copy of umap in site-packages and have it work.
Thanks. I will give it a try!
Hello,
I just took the same bullet.
Environment:
I used a higher version of Numpy as a fix to https://github.com/scikit-learn-contrib/hdbscan/issues/457.
Having that one fixed, I stumbled on this issue. So I tried the fix you suggested in:
[...] In
umap/distances.py
there is a function definition:@numba.vectorize(fastmath=True) def correct_alternative_cosine(d): return 1.0 - pow(2.0, -d)
If you change that to
@numba.njit(fastmath=True) def correct_alternative_cosine(ds): result = np.empty_like(ds) for i in range(ds.shape[0]): result[i] = 1.0 - np.power(2.0, ds[i]) return result
[...] you can just make this edit in your installed copy of umap in site-packages and have it work.
This change works for me.
However, small correction here: the distance definition is not in umap/distances.py
but in pynndescent/distances.py
.
So, if you are using venv
, in .venv/lib/pythonX.X/site-packages/pynndescent/distances.py
apply the changes suggested.
Thanks for letting me know it works, and also for the correction on where to make the change!
Another option (not having to mess around in an install env) is to do the following somewhere in your own code:
import pynndescent
pynn_dist_fns_fda = pynndescent.distances.fast_distance_alternatives
pynn_dist_fns_fda["cosine"]["correction"] = correct_alternative_cosine
pynn_dist_fns_fda["dot"]["correction"] = correct_alternative_cosine
Runnning into this issue currently and trying the above comment, it can't find correct_alternative_cosine
:
NameError: name 'correct_alternative_cosine' is not defined
tried to change it to pynndescent.distances.correct_alternative_cosine
but that gave the original error as well
I get the same error for below code
umap_embeddings = umap.UMAP(n_neighbors=np.min([5, data_df.shape[0]]),
n_components=3,
metric='cosine',
random_state=17
).fit_transform(embeddings)
numpy==1.24.4
umap-learn==0.5.3
pandas==1.5.3
hdbscan==0.8.33
The above code works for files containing number of lines <3k but fails for >5k and there after. @lmcinnes Can you please help.
Hi,
I am actually using
umap
, but i know it is usingpynndescent
under the hood. When I am running umap with > 10k rows, I get following errors:This is the minimal reproducible codes
This is the environment:
python 3.8.2
This did not happen in the prev version of my application. I suspect might be due to the new
numpy
version. However, because i am also usinghdbscan
, it does not work with any numpy version except1.22.0
.