Open sxia-cb opened 3 years ago
I did some tests and found it is wield: umap_model = umap.UMAP(n_neighbors=15,n_components=5, low_memory=False, random_state=42).fit(df) umap_model.transform(x) I got the error: TypeError: can't pickle weakref objects but continue to do: df1 = df.reset_index(drop = True) or save df to file, then load umap_model = umap.UMAP(n_neighbors=15,n_components=5, low_memory=False, random_state=42).fit(df1) umap_model.transform(x) it works and then ran df again: umap_model = umap.UMAP(n_neighbors=15,n_components=5, low_memory=False, random_state=42).fit(df) umap_model.transform(x) it also works
And It got error when run first time umap_model = umap.UMAP(n_neighbors=15,n_components=5, low_memory=False, random_state=42).fit(df) umap_model.transform(x) but as long as it works, it will not get this error.
Any idea to solve this issue ?
Hit this error also, it seems to be in PyNNDescent. I'm on the latest commit dd415c
```python
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
lib/python3.7/site-packages/numba/core/caching.py in save(self, key, data)
486 # If key already exists, we will overwrite the file
--> 487 data_name = overloads[key]
488 except KeyError:
KeyError: ((array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C), array(float32, 2d, C), type(CPUDispatcher(
```python
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
@adilosa, do you know how to solve this issue?
@adilosa This is an issue with numba interacting with whatever disk is available for caching; I'm not sure there is much I can do because this is deep into numba. Ostensibly you can try setting environment variables for the numba cache directory to point it to more reliable disk. I would definitely reach out to the numba team for help with this however.
Ah, ok, thanks Leland. For reference, I had this issue on the latest umap commit. Normally I use umap==0.4.6, pynndescent==0.4.8, and numba==0.51.2 – and reinstalling these versions work just fine as usual.
@sxia-cb, perhaps the issue is with a recent numba release, maybe roll back a version and see if it goes away. the related issues also mention it might be a problem with cloud storage
Same stack trace in numba/numba#7279, also referencing lmcinnes/pynndescent#133
I ran into this problem and found a way around it. Hopefully this might help others. My code was running fine on local machines but resulting in this error when run on Colab Pro. The solution referenced in the numba forums related to env vars did not work when using the default Colab libraries and umap installed via 'pip install umap-learn'. However, the error went away if I updated libraries to these versions and then used the env var suggestions
!pip install tbb --upgrade !pip install numba==0.52.0 !pip install pynndescent==0.5.2 !mkdir /tmp/numba_cache %env NUMBA_CACHE_DIR=/tmp/numba_cache !pip install umap-learn
Note that you have to restart the Colab instance until these new versions 'take'. Ideally check version for each.
Another way to make it work is to just grab the latest versions of these dependencies in which case you don't need the env var change: !pip install --upgrade tbb numba pynndescent umap-learn. Of course you still have to restart Colab to ensure the latest versions are in play.
I ran UMAP with umap_model = umap.UMAP(n_neighbors=15,n_components=5, low_memory=False, random_state=42).fit(df) and it works, but when run joblib.dump(umap_model, 'test.sav') or umap_model.transform(df1) get errors: TypeError: can't pickle weakref objects