scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.
http://hdbscan.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2.79k stars 500 forks source link

ValueError: numpy.ndarray size changed when calling import hdbscan #457

Open doctor3030 opened 3 years ago

doctor3030 commented 3 years ago

When I try to import hdbscan I get following error:

`--------------------------------------------------------------------------- ValueError Traceback (most recent call last)

in 1 from sklearn.decomposition import PCA 2 import umap ----> 3 import hdbscan 4 from hyperopt import fmin, tpe, atpe, rand, hp, STATUS_OK, Trials, SparkTrials 5 import pickle c:\program files\python37\lib\site-packages\hdbscan\__init__.py in ----> 1 from .hdbscan_ import HDBSCAN, hdbscan 2 from .robust_single_linkage_ import RobustSingleLinkage, robust_single_linkage 3 from .validity import validity_index 4 from .prediction import approximate_predict, membership_vector, all_points_membership_vectors 5 c:\program files\python37\lib\site-packages\hdbscan\hdbscan_.py in 19 from scipy.sparse import csgraph 20 ---> 21 from ._hdbscan_linkage import (single_linkage, 22 mst_linkage_core, 23 mst_linkage_core_vector, hdbscan\_hdbscan_linkage.pyx in init hdbscan._hdbscan_linkage() ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject` I use: python 3.7.9 numpy 1.19.3 (I also tried 1.19.5) I would appreciate your help.
omarsumadi commented 3 years ago

Having this same exact issue as of Yesterday on Python3.8 any Numpy Version in the past Year

Augusttell commented 3 years ago

Also having this issue. Tried Numpy version 1.20 and 1.16.1

paulthemagno commented 3 years ago

The same with Python 3.7.9 in my case . Now it's working with Python 3.7.6 for me.

omarsumadi commented 3 years ago

I fixed it by installing the package with with pip install adding the flags --no-cache-dir --no-binary :all: Apparently this allows your wheels to re-compile with your local version of Numpy.

I honestly have no idea why this is happening, in addition to other packages I use - perhaps someone re-compiled Cython scripts with and didn't make a changelog. I'm literally shooting completely blind here though.

Augusttell commented 3 years ago

Reompile also worked for me. Using public cloud that messes with compilation.

omarsumadi commented 3 years ago

Reompile also worked for me. Using public cloud that messes with compilation.

But does anyone know WHY this is actually happening? Especially on different projects as well outside of this repo?

paulthemagno commented 3 years ago

@omarsumadi can you explain me how to do that? I put the --no-cache-dir --no-binary :all: at the end of all my pip install lines but it didn't worked in Python 3.7.9.

omarsumadi commented 3 years ago

@paulthemagno Take a look at this stack overflow post: https://stackoverflow.com/questions/40845304/runtimewarning-numpy-dtype-size-changed-may-indicate-binary-incompatibility

Realistically, the only thing you would change would be: pip install hdbscan --no-cache-dir --no-binary :all:

If that doesn't work, I'm not sure. Try not setting a version of Numpy to install and letting Pip reconcile which Numpy should be installed if you are using multiple packages that rely on Numpy. Perhaps your issue is a bit deeper.

The way to actually solve all this though is to figure out why this happened in the first place.

ymwdalex commented 3 years ago

I use another package https://github.com/ing-bank/sparse_dot_topn with cython and numpy. And from today/yesterday, I got exactly the same error numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject.

My enviroment is aws/codebuild/amazonlinux2-x86_64-standard:3.0. I downgraded numpy version and it doesn't work.

pip install package --no-cache-dir --no-binary :all: fixed the problem. FYI.

omarsumadi commented 3 years ago

@ymwdalex That's actually the same package I came to this thread for. I don't have hbdscan installed, but came to help because I was trying to solve the sparse_dot_topn package issue.

To you, do you know why this is happening? I really don't want to have another go at fixing this bug again and having no idea where to start.

We could start by asking them. Or maybe scipy (a dependcy of both) decided to re-compile it's wheels to a different version of Numpy and everything broke?

ymwdalex commented 3 years ago

@omarsumadi thanks for the comments. I am the author of sparse_dot_topn. I didn't change the source code recently and have no idea why this happening...

omarsumadi commented 3 years ago

@ymwdalex Ok - that is kind of funny lol! By the way, hi! I love you work and everything that you have done the library is truly one of a kind and I have not found anything that comes close to its capabilities, which is sort of why I have a vested interest in seeing this through.

I'll spill to you wat I could figure out:

Again, this kind of thing is way outside of my comfort zone (I know nothing about Cython and Numpy cross-over), but perhaps we could find the version of Numpy that was used to compile the wheels and pin that as the version for your library?

Sorry if some of this doesn't make much sense.

doctor3030 commented 3 years ago

The same with Python 3.7.9 in my case . Now it's working with Python 3.7.6 for me.

I eventually installed python 3.7.6 and everything worked. However, I have another machine with 3.7.9 where everything works fine. So its not related to python version I think..

omarsumadi commented 3 years ago

@doctor3030 I'm not sure if you should close this, not until there's some better solution to other people's problems. I don't want to tell you how to do things and I most definitely respect your contributions, but I'd imagine this is definitely NOT solved especially since its pulling cross-package discussion.

I think there's a lot of cross interest figuring out what exactly happened as well. Unfortunately, I'm not well versed enough in Cython and Numpy internals to offer the correct solution other than to rebuild the wheels.

Thanks, Omar

doctor3030 commented 3 years ago

@doctor3030 I'm not sure if you should close this, not until there's some better solution to other people's problems. I don't want to tell you how to do things and I most definitely respect your contributions, but I'd image this is definitely NOT solved especially since its pulling cross-package discussion.

I think there's a lot of cross interest figuring out what exactly happened as well.

Thanks, Omar

Ok, lets keep it open.

omarsumadi commented 3 years ago

Here's what I can say, apparently someone says Numpy 1.20.0 (probably what Scipy is compiled in due to some change that is now impacting all of us) according to the above (https://github.com/Trusted-AI/adversarial-robustness-toolbox/pull/87).

What is most likely happening among us that is that we are using packages that limit Numpy installation version to something below 1.20.0 (such as Tensorflow).

Perhaps someone could verify the pull I linked?

cavvia commented 3 years ago

I have this issue when trying to use Top2Vec on Python 3.7.9, which pulls in Tensorflow and locks me to Numpy 1.19. Rebuilding HDBScan from source in turn fails on this Accelerate error, so I think I have to rebuild NumPy from source with OpenBLAS (although NumPy is otherwise working fine), which in turn is proving difficult.

So this is still very much an issue for me, no doubt for some others too.

paulthemagno commented 3 years ago

@cavvia the same with a similar library: BERTopic to me! I tried also with pip install package --no-cache-dir --no-binary :all: but doesn't change anything. But in my case the problem occurs in a Python 3.7.9 while with Python 3.7.6 it works well.

AltfunsMA commented 3 years ago

I can report the same issue as @cavvia after trying to use top2vec on 3.8.0 and on 3.7.5... encountering issues with UMAP when trying to work around it...

x1s commented 3 years ago

Hello guys, we're facing the same issue here since this last weekend with no changes on the code or any library versions.

Isolating it to check what could have been happening

Dockerfile

FROM python:3.7-slim-buster
RUN apt-get update \
    && apt-get install -y --no-install-recommends python3.7-dev=3.7.3-2+deb10u2 build-essential=12.6 jq=1.5+dfsg-2+b1 curl=7.64.0-4+deb10u1 \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* \
    && pip install --upgrade pip
COPY . .
RUN python -m pip install --user -r requirements.txt
CMD ["python", "-m", "test.py"]

requirements.txt

hdbscan==0.8.26
numpy==1.18.5

test.py

import hdbscan

print("hello")

outputs

$ docker run 9523faa77267 python test.py
Traceback (most recent call last):
  File "test.py", line 1, in <module>
    import hdbscan
  File "/home/someuser/.local/lib/python3.7/site-packages/hdbscan/__init__.py", line 1, in <module>
    from .hdbscan_ import HDBSCAN, hdbscan
  File "/home/someuser/.local/lib/python3.7/site-packages/hdbscan/hdbscan_.py", line 21, in <module>
    from ._hdbscan_linkage import (single_linkage,
  File "hdbscan/_hdbscan_linkage.pyx", line 1, in init hdbscan._hdbscan_linkage
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

It works with numpy==1.20 tough.

The point is, as mentioned here before, we use tensorflow on our project and we're locked by it on numpy<1.19.

I'm new on the python/pypi world, but I assumed that built wheels couldn't be updated (recompiled with updated libraries/dependencies) and if a updated was needed, a new release would be drafted with a minor change.

Is there anything else we can help with? I couldn't get exactly which lib was recompiled (hdbscan or scipy?) but noticed a difference on the checksum/size for the hdbscan on different builds but not sure it's related.

# last week (when everything worked)
Created wheel for hdbscan: filename=hdbscan-0.8.26-cp37-cp37m-linux_x86_64.whl size=687506 sha256=bd8b0c65d14ffa1d804f4a3df445fc4300452968a2372d581f0bb64963a8010d
# yesterday (when the error started happening)
Created wheel for hdbscan: filename=hdbscan-0.8.26-cp37-cp37m-linux_x86_64.whl size=686485 sha256=05668339290a597a871ee90da2b50a7ca415f18b82dba59ad6c08bb9b5b9192f
ymwdalex commented 3 years ago

@omarsumadi Thanks a lot for your investigation. I also open an issue in sparse_dot_topn package to refer this issue.

numpy 1.20.0 works for me.

In my environment which has problem, I installed numpy==1.19 first, then install sport_dot_topn, which use the latest cython and scipy (https://github.com/ing-bank/sparse_dot_topn/blob/master/setup.py#L70). Probably the latest cython or scipy has some update with incompatible with numpy version before 1.20.

rajatkumarraghuvanshi1 commented 3 years ago

Make sure that you use correct and compatible version of libs .

annoy==1.17.0 cython==0.29.21 fuzzywuzzy==0.18.0 hdbscan==0.8.26 joblib==1.0.0 kiwisolver==1.3.1 llvmlite==0.35.0 matplotlib==3.3.2 numba==0.52.0 numpy==1.20.0 pandas==1.1.2 pillow==8.1.0 pyarrow==1.0.1 python-levenshtein==0.12.1 pytz==2021.1 scikit-learn==0.24.1 scipy==1.6.0 six==1.15.0 threadpoolctl==2.1.0 tqdm==4.50.0 umap-learn==0.5.0

omarsumadi commented 3 years ago

@omarsumadi Thanks a lot for your investigation. I also open an issue in sparse_dot_topn package to refer this issue.

numpy 1.20.0 works for me.

In my environment which has problem, I installed numpy==1.19 first, then install sport_dot_topn, which use the latest cython and scipy (https://github.com/ing-bank/sparse_dot_topn/blob/master/setup.py#L70). Probably the latest cython or scipy has some update with incompatible with numpy version before 1.20.

@ymwdalex Alternative is to (downgrade Scipy as well and keep the current Numpy version) or (install with no binary :all:). The problem is I stand to bet a lot of people are going to probably use some other Pip Package that doesn't support Numpy 1.20.0 (big hint to Tensorflow) (especially since the new version number represents a step up so many people may have < 1.20.0 in their setups.

lmcinnes commented 3 years ago

I admit that I am as much at a loss as everyone else here. In fact I have little understanding of the binary wheel infrastructure on PyPI. I have not provided any new packages or wheels for hdbscan recently (i.e. within the last many months), so if there is a change it was handled by some automated process. Compiling from source (and, in fact, re-cythonizing everything) is likely the best option, but that does not leave a great install option. Any assistance from anyone with more experience in packaging than me would be greatly appreciated.

Augusttell commented 3 years ago

This was resolved for me using the following requirements: cython==0.29.21 numpy==1.20.0 scipy==1.5.4 scikit-learn==0.24.1 joblib==1.0.0 six==1.15.0

salman1993 commented 3 years ago

@lmcinnes - it might be due to some packages in the requirements.txt not being pinned such as numpy>=1.16.0. it could be worth looking into pinning them in both directions >= x, <= y such as here

lmcinnes commented 3 years ago

@salman1993 Thanks. I agree that something like that might be good, however the difficulty is that it does work with numpy 1.20; it is in interactions with other packages that then install numpy 1.19, or similar. That means I'm not really sure what bounds to use. For now I may just restrict to numpy <= 1.19 as hopefully that may fix things for the moment, but I feel like that is really just a temporary fix, will be unnecessarily restrictive on numpy versions in the not too distance future.

omarsumadi commented 3 years ago

So what fixed it for me is installing with pip using --no-cache-dir --no-binary :all:. Is there any merit to doing that? Or is installing with pip forcing --no-binary not something looked upon highly?

Restricting the version doesn't help (at least I don't think) because it is the old version (non-1.20.0) that is causing the issues. It's most likely the fact that Scipy is compiled in 1.20.0 and everyone else isn't using 1.20.0 and the backwards compatibility in wheels everyone's been accustomed to broke.

Someone from Scipy (not Scikit's problem) (what everyone here has in common) needs to come and say what happened :) so we can all figure out how to proceed, but that's my guess as to what happened.

lmcinnes commented 3 years ago

I just spent the last while trying to reproduce this, and to work out what is going astray. I don't have any firm answers, but it seems like starting from a fresh python environment as long as you pick one of numpy 1.19 or numpy 1.20 and then stick with that version for any other packages that get installed (i.e. if you have any dependencies that need numpy 1.19, start with that version, and stay with it), everything works fine. It was when I had an install of a package changed the numpy version I had installed I could get this error.

Other ways I imagine you may be able to get the error: if your pip cache has a version downloaded (and possibly built into a wheel) when you had a different numpy version then things could go astray like this. The fix for that seems to be the --no-cache-dir --no-binary :all: option to pip.

I'm not sure I have any good answers other than managing to ensure you are building hdbscan against the version of numpy you intend to keep (and not have another package with different dependencies trample on), and to use the --no-cache-dir --no-binary :all: to ensure you are building fresh and not using an old cached wheel or similar. I know that isn't perfect, but it is what I can say for now. Hopefully over the next week or two this will shake itself out among all the various packages and dependencies.

adilosa commented 3 years ago

the day this issue started, both pip==21.01 and numpy==1.20.0 were released.

there's an issue over at pypa/pip#9542 that suggests that pip might be resolving things weirdly, now that a new version of numpy exists. some weirdness like pip may be detecting a different numpy for dependency resolution than the version you have pinned, causing it to select binaries for other packages that were compiled against the new numpy instead of your pinned version.

fwiw I have a lock file with a pinned numpy==1.19.2 and hdbscan==0.8.26 built in a clean docker image. it worked fine, and now no longer builds since this issue started. even with no caches and locked versions of pip, numpy, etc – which all did build last week. pinning to pip==21.0 also doesn't fix the issue.

it seems like the combination of pip – at least >=21.0, maybe others – and the existence of numpy==1.20.0 is the cause. which may be why --no-binary is a possible fix (as iiuc that causes everything to be recompiled from scratch, instead of using mismatched binaries). possibly pinning to numpy==1.20.0 might also work for now where that's an option

lmcinnes commented 3 years ago

Thanks @adilosa , it does look like pypa/pip#9542 may be very relevant here. It also explains why things vary a lot from user to user with respect to what solves the problem -- depending on what other packages are installed, and whether they have pyproject.toml files, and how that get handled by the version of pip they have, very different results can occur.

I certainly appreciate that hdbscan users are very frustrated by the difficulties right now, but it seems like this is likely an upstream issue and it will be hard to resolve until pip an numpy play nicely with each other again. I wish I could do more to fix this, but I am not sure that I can. Hopefully at least some of the solutions and workarounds documented here can help people manage to get to a working installation in the meantime.

raviteja-cfx commented 3 years ago

I just spent the last while trying to reproduce this, and to work out what is going astray. I don't have any firm answers, but it seems like starting from a fresh python environment as long as you pick one of numpy 1.19 or numpy 1.20 and then stick with that version for any other packages that get installed (i.e. if you have any dependencies that need numpy 1.19, start with that version, and stay with it), everything works fine. It was when I had an install of a package changed the numpy version I had installed I could get this error.

Other ways I imagine you may be able to get the error: if your pip cache has a version downloaded (and possibly built into a wheel) when you had a different numpy version then things could go astray like this. The fix for that seems to be the --no-cache-dir --no-binary :all: option to pip.

I'm not sure I have any good answers other than managing to ensure you are building hdbscan against the version of numpy you intend to keep (and not have another package with different dependencies trample on), and to use the --no-cache-dir --no-binary :all: to ensure you are building fresh and not using an old cached wheel or similar. I know that isn't perfect, but it is what I can say for now. Hopefully over the next week or two this will shake itself out among all the various packages and dependencies.

I tired this its not working for me only works with numpy==1.20

thclark commented 3 years ago

I'm concerned that pypa/pip@9542 is closely related (and may be the cause of some of the users in this thread having this symptom), but may not be the entire story.

In that situation, it seems like two versions of numpy get downloaded, then modules that depend on it get built against the wrong one.

However, here:

So it's possible that for some users in this thread, the root cause is different to others.

Unpicking what happened

This is not caused by, but has emerged because of, pip's introduction of a stricter dependency manager in pip 21. If you've pinned an incompatible set of versions, pip21 should now barf at you instead of proceeding.

This suggests to me that (in the absence of broken build setups like pypa/pip@9542), where hdbscan specifies a compatible version range with other libraries (numpy, basically) in setup.py, either:

To obfuscate the latter slightly, this may be functionality that numpy re-exports from one of its component/dependency libraries.

Either way, that would suggests it's possible to solve here in hdbscan - to specify either an additional dependency range (of one or more of numpy's dependencies) or tightening the numpy range.

Until then, the issue will persist until all versions of numpy in active use have dependencies compatible with hdbscan, which may happen (particularly if this is to do with older/deprecated functionality in numpy) or may not.

Upgrading to the following combination sorted the issue for me, which is encouraging in that it suggests that the incompatibility is related to older versions of numpy/numpy's dependencies and therefore might just become a less frequent occurrence over time.

NOT WORKING

With pip 21.0.1, python 3.8.7, specifications in requirements.txt, no cache dir, not building binaries... (although the wheel gets built for hdbscan)

# requirements.txt
scipy==1.5.4
numpy==1.19.5
hdbscan==0.8.26

Checking the specified numpy got installed...

>>> import numpy
>>> numpy.version.version
'1.19.5'

NB I also checked my pip install logs to make sure there aren't duplicated versions, not putting the logs here because this post is long enough already.

WORKING

With pip 21.0.1, python 3.8.7, specifications in requirements.txt, no cache dir, not building binaries for numpy

(although the wheel gets built for hdbscan)

# requirements.txt
scipy==1.6.0
numpy==1.20.0
hdbscan==0.8.27

# Checking the specified numpy got installed
>>> import numpy
>> numpy.version.version
'1.20.0'
bhavul commented 3 years ago

I've been struggling with this as well since last couple of days. As like some other folks here, I required both umap-learn and hdbscan both (to use BERTopic); and with regular pip and latest numpy there's some compatibility issues occurring leading to this issue, or this umap issue.

This is finally what worked for me. Inspired from multiple above posts (thanks @thclark, @lmcinnes and others). Just expanding to give a Dockerfile which folks can probably use directly.

Dockerfile

FROM python:3.8.7

# change shell to bash
SHELL ["/bin/bash", "-c"]

# Install needed libraries
RUN pip install --upgrade pip
RUN pip install --upgrade numpy umap-learn
RUN pip install --upgrade hdbscan --no-cache-dir --no-binary :all:

# Install whatever else you wanted to (example : jupyter here)
RUN pip install --upgrade jupyter scikit-learn jupyter-client jupyter-console jupyter-core jupyterlab_server jupyterlab

WORKDIR "/src/"
CMD ["jupyter-lab", "--port=8888", "--no-browser", "--ip=0.0.0.0", "--allow-root"]

Build the image via:

docker build -f Dockerfile -t clustering-jupyter:v1 .

Run via :

docker run -it -p 8888:8888 -v $PWD:/src/ --name clustering-jupyter clustering-jupyter:v1

Open localhost:8888 on your browser.

This will install numpy==1.20.0, umap-learn==0.5.0, hdbscan==0.8.27 ; each of which is the latest version.

lmcinnes commented 3 years ago

@theclark Thanks so much for picking through this -- I really don't have the sufficient level of expertise to tease this out as well as you have. In terms of fixing things here (which I would be keen to manage if possible), is there a good way to work out how to improve the requirements? Historically this all worked just fine with prior numpy versions, so it should be compatible with them. Any suggestions for what I should be looking at?

TheClark commented 3 years ago

Wrong @TheClark I think you want @theclark. Sorry and good luck!

TheClark commented 3 years ago

Err I guess the lowercase theclark is not taggable anymore. Anyways, did not want to leave you hanging.

omarsumadi commented 3 years ago

Not quite getting how Pip is starting to become the issue at topic here. Seems as if the issue is persisting regardless of what Pip version is being used. Was wondering if someone could explain how the latest update to pip broke things.

adilosa commented 3 years ago

This is not a new issue with the latest pip. It seems the issue is when building hdbscan, pip correctly selects hdbscan's choice of numpy==*, but in isolation – without any constraint from what is pinned in the user's requirements.txt. Whenever those two versions, numpy==* and whatever is pinned, are binary incompatible – as they became suddenly when 1.20.0 got released – this can happen on the user's next build.

There is no way for the end user to specify which version of numpy pip should use in the isolated build. This is what the discussion at pypa/pip#9542 is about. As well, HDBSCAN is probably correct in specifying "*" here, because the idea is to build against the user's choice of numpy. Specifying a version here would force all HDBSCAN users to use a specific numpy, or compatible.

Best I can tell, this is not a new behavior of pip. In fact, reverting to older versions – even versions before the new dependency resolver – has no impact on this. Probably, pip has always been doing this and the only reason it's breaking now is because numpy==1.20.0 is a breaking ABI change. This also means that there's nothing to "roll back", because the mere existence of a new numpy is enough to cause it.

This is also why pinning numpy==1.20.0 will fix it. The installed numpy will be the same as hdbscan builds against. Although, this is only by coincidence for as long as numpy==* continues to be binary compatible. If numpy==1.21.0 got released tomorrow with breaking changes, this would happen again on the next build.

[Workarounds]

fwiw, I don't see what hdbscan could set here to fix this for everyone. hdbscan is setting its build dependency correctly and user's are likely correctly pinning their desired version of numpy. the root issue is even with both parties doing the right thing, the way pip resolves this case creates this issue, and there's no config to tell pip to do otherwise.

omarsumadi commented 3 years ago

@adilosa @lmcinnes if there ever was to be a selected answer for this issue, this would be it. Thanks for this - extremely helpful. You have my respect sir/ma'am.

danielkovtun commented 3 years ago

@adilosa @lmcinnes if there ever was to be a selected answer for this issue, this would be it. Thanks for this - extremely helpful. You have my respect sir/ma'am.

Will second this! Sharing what worked for me in case it can help someone else:

See below for my requirements.txt and relevant Dockerfile section:

# requirements.txt
tensorflow==1.15.2
numpy==1.18.1
scikit-learn==0.22.1
# Dockerfile
RUN python -m pip install --upgrade pip setuptools
ADD requirements.txt .
RUN pip install -r ./requirements.txt --no-cache-dir
RUN pip install hdbscan --no-cache-dir --no-binary :all: --no-build-isolation
Affernus commented 3 years ago

Hi all, this solution working for me: !pip install numpy==1.20.0

leinad87 commented 3 years ago

numpy==1.20.1 has been released. Does anyone tried if this fixes the issue?

mariob316 commented 3 years ago

I had the same issue with ConfigSpace and now hdbscan. For ConfigSpace I ended up replacing numpy with oldest-supported-numpy in the pyproject.toml and rebuilding it myself. See this issue

sklearn recently did the same in their main branch https://github.com/scikit-learn/scikit-learn/blob/main/pyproject.toml I will try with hdbscan and report back.

mariob316 commented 3 years ago

Using oldest-supported-numpy worked for me. Here's a pr for it https://github.com/scikit-learn-contrib/hdbscan/pull/458

lcerman commented 3 years ago

@adilosa @danielkovtun same for me, the --no-binary alone did not helped me, the --no-build-isolation made it! I have numpy==1.19.2 (need this version because of TensorFlow..), pip 20.3.3, python 3.8.0 and many more other packages...

wedesoft commented 3 years ago

Not sure if this helps, but I had this problem in our project and managed to fix it by deleting generated cpp and pyd files and rebuilding them (there were pyx files using numpy).

ghost commented 3 years ago

When I try to import hdbscan I get following error:

`--------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 1 from sklearn.decomposition import PCA 2 import umap ----> 3 import hdbscan 4 from hyperopt import fmin, tpe, atpe, rand, hp, STATUS_OK, Trials, SparkTrials 5 import pickle

c:\program files\python37\lib\site-packages\hdbscaninit.py in ----> 1 from .hdbscan_ import HDBSCAN, hdbscan 2 from .robust_singlelinkage import RobustSingleLinkage, robust_single_linkage 3 from .validity import validity_index 4 from .prediction import approximate_predict, membership_vector, all_points_membership_vectors 5

c:\program files\python37\lib\site-packages\hdbscan\hdbscan_.py in 19 from scipy.sparse import csgraph 20 ---> 21 from ._hdbscan_linkage import (single_linkage, 22 mst_linkage_core, 23 mst_linkage_core_vector,

hdbscan_hdbscan_linkage.pyx in init hdbscan._hdbscan_linkage()

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject`

I use: python 3.7.9 numpy 1.19.3 (I also tried 1.19.5)

I would appreciate your help.

I installed hdbscan via this link and it seemed to fix the issue. I'm currently using Python 3.9.2

CoteDave commented 3 years ago

pip install hdbscan --no-build-isolation --no-binary :all:

worked for me!

bamboosdu commented 3 years ago

@adilosa @lmcinnes if there ever was to be a selected answer for this issue, this would be it. Thanks for this - extremely helpful. You have my respect sir/ma'am.

Will second this! Sharing what worked for me in case it can help someone else:

  • The only thing that worked for me with the version pins in my requirements.txt was to install with --no-build-isolation
  • --no-binary alone was not able to solve the issue

See below for my requirements.txt and relevant Dockerfile section:

# requirements.txt
tensorflow==1.15.2
numpy==1.18.1
scikit-learn==0.22.1
# Dockerfile
RUN python -m pip install --upgrade pip setuptools
ADD requirements.txt .
RUN pip install -r ./requirements.txt --no-cache-dir
RUN pip install hdbscan --no-cache-dir --no-binary :all: --no-build-isolation

It did work! Save my life.