scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.
http://hdbscan.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2.79k stars 500 forks source link

ValueError: numpy.ndarray size changed when calling import hdbscan #457

Open doctor3030 opened 3 years ago

doctor3030 commented 3 years ago

When I try to import hdbscan I get following error:

`--------------------------------------------------------------------------- ValueError Traceback (most recent call last)

in 1 from sklearn.decomposition import PCA 2 import umap ----> 3 import hdbscan 4 from hyperopt import fmin, tpe, atpe, rand, hp, STATUS_OK, Trials, SparkTrials 5 import pickle c:\program files\python37\lib\site-packages\hdbscan\__init__.py in ----> 1 from .hdbscan_ import HDBSCAN, hdbscan 2 from .robust_single_linkage_ import RobustSingleLinkage, robust_single_linkage 3 from .validity import validity_index 4 from .prediction import approximate_predict, membership_vector, all_points_membership_vectors 5 c:\program files\python37\lib\site-packages\hdbscan\hdbscan_.py in 19 from scipy.sparse import csgraph 20 ---> 21 from ._hdbscan_linkage import (single_linkage, 22 mst_linkage_core, 23 mst_linkage_core_vector, hdbscan\_hdbscan_linkage.pyx in init hdbscan._hdbscan_linkage() ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject` I use: python 3.7.9 numpy 1.19.3 (I also tried 1.19.5) I would appreciate your help.
bridgesra commented 3 years ago

I simply upgraded every pip installed package and then it worked. Instructions for doing this are here: https://www.activestate.com/resources/quick-reads/how-to-update-all-python-packages/

ash-netizen commented 3 years ago

Hello All,

I also faced the same issue for few hours while working with the Top2vec llibrary and got it fixed just by restarting the Kernel after installing top2vec[sentence_encoders].

FYI- The kernel I was working on is the Kaggle one but the error was exact the same.

breadfan commented 3 years ago

pip install hdbscan --no-build-isolation --no-binary :all:

worked for me!

Man, you're literraly saved my life. Nothing worked in hours and I couldnot detect why Thanks a lot!

bilaltahseen commented 3 years ago

I was having the same problem when using efficient == 1.1.1 , the problem that causing this issue was the version of scipy, I installed scipy==1.4.1 with numpy=1.19.5 and tensorflow == 2.4.0, downgrade scipy to 1.4.1

dbl001 commented 3 years ago
$  pip install hdbscan --no-cache-dir --no-binary :all: --no-build-isolation

Didn't work for me. I am running in a virtual environment (e.g. tensorflow_macos_venv) with Apple's machine learning version of tensorflow. Tensorflow is keeping numpy back to 1.18.5.

Python 3.8.5 (default, Sep  4 2020, 02:22:02) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.23.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy

In [2]: print(numpy.__version__)
1.18.5

In [3]: import tensorflow

In [4]: print(tensorflow.__version__)
2.4.0-rc0

In [5]: import hdbscan
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-3f5a460d7435> in <module>
----> 1 import hdbscan

~/tensorflow_macos_venv/lib/python3.8/site-packages/hdbscan/__init__.py in <module>
----> 1 from .hdbscan_ import HDBSCAN, hdbscan
      2 from .robust_single_linkage_ import RobustSingleLinkage, robust_single_linkage
      3 from .validity import validity_index
      4 from .prediction import (approximate_predict,
      5                          membership_vector,

~/tensorflow_macos_venv/lib/python3.8/site-packages/hdbscan/hdbscan_.py in <module>
     19 from scipy.sparse import csgraph
     20 
---> 21 from ._hdbscan_linkage import (single_linkage,
     22                                mst_linkage_core,
     23                                mst_linkage_core_vector,

hdbscan/_hdbscan_linkage.pyx in init hdbscan._hdbscan_linkage()

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
maroxtn commented 3 years ago

None of these suggestions worked me for me. PS: I am using Kaggle environment.

xelandar commented 3 years ago

@adilosa @lmcinnes if there ever was to be a selected answer for this issue, this would be it. Thanks for this - extremely helpful. You have my respect sir/ma'am.

Will second this! Sharing what worked for me in case it can help someone else:

* The only thing that worked for me with the version pins in my requirements.txt was to install with `--no-build-isolation`

* `--no-binary` alone was not able to solve the issue

See below for my requirements.txt and relevant Dockerfile section:

# requirements.txt
tensorflow==1.15.2
numpy==1.18.1
scikit-learn==0.22.1
# Dockerfile
RUN python -m pip install --upgrade pip setuptools
ADD requirements.txt .
RUN pip install -r ./requirements.txt --no-cache-dir
RUN pip install hdbscan --no-cache-dir --no-binary :all: --no-build-isolation

This one worked for me. Thank you!

poorvabedmutha31 commented 3 years ago

@adilosa @lmcinnes if there ever was to be a selected answer for this issue, this would be it. Thanks for this - extremely helpful. You have my respect sir/ma'am.

Will second this! Sharing what worked for me in case it can help someone else:

* The only thing that worked for me with the version pins in my requirements.txt was to install with `--no-build-isolation`

* `--no-binary` alone was not able to solve the issue

See below for my requirements.txt and relevant Dockerfile section:

# requirements.txt
tensorflow==1.15.2
numpy==1.18.1
scikit-learn==0.22.1
RUN python -m pip install --upgrade pip setuptools
ADD requirements.txt .
RUN pip install -r ./requirements.txt --no-cache-dir
RUN pip install hdbscan --no-cache-dir --no-binary :all: --no-build-isolation

This one worked for me. Thank you!

Tried the exact same things, did not work. Getting the same error ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject Also, for the above tensorflow, numpy required should be >1.17, one in a thread above numpy==1.16.0 has been suggested. How to solve this problem? And is there anything that I am missing? As discussed above, python version shouldn't be a problem. I am using 3.6.0

Thanks, Poorva

enessimsekk1 commented 3 years ago

Hey guys, I got that error today and after my tryings i solved mine. Here the codes. I restarted kernel after installed libraries

!pip install seaborn --user !pip install pandas --user

HamidrezaSafari commented 3 years ago

use python virtual environments and install pip install gensim==3.8.3

bing-0906 commented 3 years ago

@adilosa @lmcinnes if there ever was to be a selected answer for this issue, this would be it. Thanks for this - extremely helpful. You have my respect sir/ma'am.

Will second this! Sharing what worked for me in case it can help someone else:

* The only thing that worked for me with the version pins in my requirements.txt was to install with `--no-build-isolation`

* `--no-binary` alone was not able to solve the issue

See below for my requirements.txt and relevant Dockerfile section:

# requirements.txt
tensorflow==1.15.2
numpy==1.18.1
scikit-learn==0.22.1
# Dockerfile
RUN python -m pip install --upgrade pip setuptools
ADD requirements.txt .
RUN pip install -r ./requirements.txt --no-cache-dir
RUN pip install hdbscan --no-cache-dir --no-binary :all: --no-build-isolation

This one worked for me. Thank you!

This one also worked for me during reimplementation of my project. In my project, the version of numpy 1.19.5. is required by tensorflow 2.5/2.6. Also save my life!!!!

sgbaird commented 2 years ago

In a GitHub action for mat_discover using ubuntu-latest running just an import hdbscan command, I get the following:

/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/hdbscan/__init__.py:1: in <module>
    from .hdbscan_ import HDBSCAN, hdbscan
/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/hdbscan/hdbscan_.py:21: in <module>
    from ._hdbscan_linkage import (single_linkage,
hdbscan/_hdbscan_linkage.pyx:1: in init hdbscan._hdbscan_linkage
    ???
E   ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
_____ ERROR collecting mat_discover/tests/test_suggest_next_experiment.py ______
mat_discover/tests/test_suggest_next_experiment.py:7: in <module>
    from mat_discover.adaptive_design import Adapt
mat_discover/adaptive_design.py:6: in <module>
    from mat_discover.mat_discover_ import Discover, my_mvn
mat_discover/mat_discover_.py:31: in <module>
    import hdbscan
/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/hdbscan/__init__.py:1: in <module>
    from .hdbscan_ import HDBSCAN, hdbscan
/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/hdbscan/hdbscan_.py:21: in <module>
    from ._hdbscan_linkage import (single_linkage,
hdbscan/_hdbscan_linkage.pyx:1: in init hdbscan._hdbscan_linkage
    ???
E   ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

I've been trying various suggestions from this thread without luck (uninstall and then install numpy, use special flags for hdbscan pip install, etc.). I didn't think I had changed anything, especially since I had a working version 6 days ago. I used a diffchecker to see what the differences between the installed packages were:

Old (working)

Successfully installed ElM2D-0.4.1 ElMD-0.4.8 MarkupSafe-2.0.1 Pygments-2.10.0 alabaster-0.7.12 attrs-21.3.0 babel-2.9.1 bounded-pool-executor-0.0.3 cfgv-3.3.1 chem-wasserstein-1.0.8 colorama-0.4.4 coverage-6.2 crabnet-1.2.1 cycler-0.11.0 cython-0.29.26 dill-0.3.4 dist-matrix-1.0.2 distlib-0.3.4 docutils-0.17.1 filelock-3.4.2 fonttools-4.28.5 hdbscan-0.8.27 identify-2.4.1 imagesize-1.3.0 importlib-resources-5.4.0 iniconfig-1.1.1 ipython-genutils-0.2.0 jinja2-3.0.3 joblib-1.1.0 jsonschema-4.3.2 jupyter-core-4.9.1 kaleido-0.2.1 kiwisolver-1.3.2 llvmlite-0.37.0 markdown-it-py-1.1.0 mat-discover-2.0.0 matplotlib-3.5.1 mdit-py-plugins-0.2.8 myst-parser-0.15.2 nbformat-5.1.3 nodeenv-1.6.0 numba-0.54.1 numpy-1.20.3 packaging-21.3 pandas-1.3.5 pillow-8.4.0 platformdirs-2.4.1 plotly-5.5.0 pluggy-1.0.0 pqdm-0.1.0 pre-commit-2.16.0 psutil-5.8.0 py-1.11.0 pynndescent-0.5.5 pyparsing-3.0.6 pyrsistent-0.18.0 pytest-6.2.5 pytest-cov-3.0.0 python-dateutil-2.8.2 pytz-2021.3 pyyaml-6.0 scikit-learn-1.0.2 scipy-1.7.3 seaborn-0.11.2 six-1.16.0 snowballstemmer-2.2.0 sphinx-4.2.0 sphinxcontrib-applehelp-1.0.2 sphinxcontrib-devhelp-1.0.2 sphinxcontrib-htmlhelp-2.0.0 sphinxcontrib-jsmath-1.0.1 sphinxcontrib-qthelp-1.0.3 sphinxcontrib-serializinghtml-1.1.5 tenacity-8.0.1 threadpoolctl-3.0.0 toml-0.10.2 tqdm-4.62.3 traitlets-5.1.1 umap-learn-0.5.2 virtualenv-20.11.0 zipp-3.6.0

New

Successfully installed ElM2D-0.4.1 ElMD-0.4.8 MarkupSafe-2.0.1 Pygments-2.11.1 alabaster-0.7.12 attrs-21.4.0 babel-2.9.1 bounded-pool-executor-0.0.3 cfgv-3.3.1 chem-wasserstein-1.0.8 colorama-0.4.4 coverage-6.2 crabnet-1.2.1 cycler-0.11.0 cython-0.29.26 dill-0.3.4 dist-matrix-1.0.2 distlib-0.3.4 docutils-0.17.1 filelock-3.4.2 fonttools-4.28.5 hdbscan-0.8.27 identify-2.4.1 imagesize-1.3.0 importlib-resources-5.4.0 iniconfig-1.1.1 ipython-genutils-0.2.0 jinja2-3.0.3 joblib-1.1.0 jsonschema-4.3.3 jupyter-core-4.9.1 kaleido-0.2.1 kiwisolver-1.3.2 llvmlite-0.37.0 markdown-it-py-1.1.0 mat-discover-2.0.0 matplotlib-3.5.1 mdit-py-plugins-0.2.8 myst-parser-0.15.2 nbformat-5.1.3 nodeenv-1.6.0 numba-0.54.1 numpy-1.20.3 packaging-21.3 pandas-1.3.5 pillow-9.0.0 platformdirs-2.4.1 plotly-5.5.0 pluggy-1.0.0 pqdm-0.1.0 pre-commit-2.16.0 psutil-5.9.0 py-1.11.0 pynndescent-0.5.5 pyparsing-3.0.6 pyrsistent-0.18.0 pytest-6.2.5 pytest-cov-3.0.0 python-dateutil-2.8.2 pytz-2021.3 pyyaml-6.0 scikit-learn-1.0.2 scipy-1.7.3 seaborn-0.11.2 six-1.16.0 snowballstemmer-2.2.0 sphinx-4.2.0 sphinxcontrib-applehelp-1.0.2 sphinxcontrib-devhelp-1.0.2 sphinxcontrib-htmlhelp-2.0.0 sphinxcontrib-jsmath-1.0.1 sphinxcontrib-qthelp-1.0.3 sphinxcontrib-serializinghtml-1.1.5 tenacity-8.0.1 threadpoolctl-3.0.0 toml-0.10.2 tqdm-4.62.3 traitlets-5.1.1 umap-learn-0.5.2 virtualenv-20.13.0 zipp-3.7.0

I then pinned every version that had changed back to its old version, but still was getting the same error. I checked other things like the CPython version (3.8.12), and all with no luck. A comparable workflow on my local computer on Windows works just fine; not sure why it's failing with GitHub action workflow runners. Haven't tried on WSL (or GitHub actions with a Windows runner), but I'm kind of hitting a wall on this one, especially since I don't actually have access to the computer it's running on (GH actions computer).

dszakallas commented 2 years ago

We encountered the same error. Minimal reproducible example:

FROM python:3.8
RUN pip install numpy==1.20.3 hdbscan==0.8.27
RUN python -c 'import hdbscan'

This results in:

 > [3/3] RUN python -c 'import hdbscan':                                                                   
#6 0.864 Traceback (most recent call last):                                                                
#6 0.864   File "<string>", line 1, in <module>                                                            
#6 0.864   File "/usr/local/lib/python3.8/site-packages/hdbscan/__init__.py", line 1, in <module>          
#6 0.864     from .hdbscan_ import HDBSCAN, hdbscan                                                        
#6 0.864   File "/usr/local/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 21, in <module>
#6 0.864     from ._hdbscan_linkage import (single_linkage,
#6 0.864   File "hdbscan/_hdbscan_linkage.pyx", line 1, in init hdbscan._hdbscan_linkage
#6 0.864 ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

The only workaround I found was to upgrade to numpy==1.22.0

sgbaird commented 2 years ago

@dszakallas thanks for the quick response! This probably would work if not for my dependency gridlock. It automatically rolls back to numpy==1.20.3 for me, probably because of the numba dependency. Maybe I just try to wait it out and ignore that particular GitHub actions workflow for now.. Thank you again, definitely was worth a shot.

MaartenGr commented 2 years ago

The BERTopic package is running into the same issue where pip installs are not working anymore. Indeed, upgrading to a version higher than 1.20.3 seems to work for me. However, setting pyproject.toml with oldest-supported-numpy does not fix the issue. For me, it only happens when I want to install hdbscan together with umap-learn.

Numpy 1.22.0 was released only a few days ago and it seems that since then this issue appears.This does seem to mean, however, that some releases of numpy may affect the workings of hdbscan even if you use an older version of numpy.

This thread was opened 31st of January whereas NumPy 1.20 was released on the 30th of January. Now we see something similar, NumPy 1.22.0 was released a few days ago and we see the ValueError issue popping up.

I have no clue what exactly is happening here, but it seems that HDBSCAN is affected whenever numpy makes a new release.

swang423 commented 2 years ago

@sgbaird If numba is the only thing that's bothering you, try downgrade numba to 0.53 first, then upgrade numpy to 1.22.0. https://stackoverflow.com/questions/70148065/numba-needs-numpy-1-20-or-less-for-shapley-import

sgbaird commented 2 years ago

@swang423 thank you! This did the trick to get my GitHub actions, pip-based pytest unit tests back up and running. pip install numba==0.53.* numpy==1.22.0

eterna2 commented 2 years ago

@MaartenGr

I have the same issue. And numpy=1.22.0 is causing a bug with umap when u are using cosine distance. So now if hdbscan is working, umap is not. If umap is working, i cannot get hdbscan to work.

https://github.com/lmcinnes/pynndescent/issues/163

:(

RajamannarAanjaram commented 2 years ago

I face the same issue.

I tired installing using --no-cache-dir --no-binary :all: --no-build-isolation, project.toml as well but still getting the same error.

python -V ==3.8.10 numpy==1.22.0 umap-learn==0.5.1 hdbscan==0.8.27

but for some wierd reason when I install these packages using conda install command I'm not getting these error, but this fails on pip install. Only difference is numpy version(1.20.3).

juanroesel commented 2 years ago

@sgbaird @swang423 @MaartenGr Thanks for sharing all your inputs! This seems to have done for me also pip install numba==0.53.* numpy==1.22.0 when trying to import BERTopic inside a Jupyter Notebook instance. Topic models are training just fine now.

bertopic                  0.9.4                    pypi_0
hdbscan                   0.8.27                   pypi_0
numba                     0.53.0                   pypi_0
numpy                     1.22.0                   pypi_0 
pip                       21.2.4           py39hecd8cb5_0 
python                    3.9.7                h88f2d9e_1
pyyaml                    5.4.1                    pypi_0
toml                      0.10.2                   pypi_0
umap-learn                0.5.2                    pypi_0
ShorelLee commented 2 years ago

I also have this problem. When I got import hdbscan into the script and try to run the python script, I get the following error: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject I did some experiments, but found that this seems to be a problem with the hdbscan package itself, and has nothing to do with the version of numpy. If you used the command pip install hdbscan to install the hdbscan package in your virtual environment, please uninstall it, and then try to use the command conda install -c conda-forge hdbscan to reinstall hdbscan. Hope this one can solve your problem!

MaartenGr commented 2 years ago

The issue turned out to be a fair bit less complex than I had thought 😅 The pypi release does not have yet the oldest-supported-numpy in its pyproject.toml. It seems that the master branch does have that fix, so simply using hdbscan from the master branch fixes the issue for me.

@lmcinnes Sorry to tag you like this but it seems that the issue should be solved whenever a new pip version is released. Fortunately, this also means that after that release we will not likely see this issue popping up anymore.

BhujayKumarBhatta commented 2 years ago

I faced the same issue while working on an anaconda. Then I came out from the conda environment and created a simple venv with python 3.9.7. installed hdbscan using pip , generated the requirements file. I created a fresh conda env and installed hdbscan with the required file. I am able to use it now.

pelog39) u1@ubuntu:~$ cat hdbscan_requirement.txt Cython==0.29.26 hdbscan==0.8.27 joblib==1.1.0 numpy==1.22.0 scikit-learn==1.0.2 scipy==1.7.3 six==1.16.0 threadpoolctl==3.0.0 pip install -r hdbscan_requirement.txt pelog39) u1@ubuntu:~$ python -c 'import hdbscan' (pelog39) u1@ubuntu:~$

for hdbscan to work with pytorch : conda install -c conda-forge hdbscan conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

tocom242242 commented 2 years ago

In my env, numpy==1.21.5 works

chaituValKanO commented 2 years ago

@swang423 thank you! This did the trick to get my GitHub actions, pip-based pytest unit tests back up and running. pip install numba==0.53.* numpy==1.22.0

This worked for me. Below is my env.yml (not in complete) (I had issue numba as well as someone mentioned above). Everything got fixed with below versions

thedatadecoder commented 3 months ago

Downgrading to a suitable hdbscan version has helped me here. Use trial and error to find the appropriate version. Following versions worked for me: %pip install hdbscan==0.8.33 %pip install numpy==1.20.3