Open doctor3030 opened 3 years ago
Having this same exact issue as of Yesterday on Python3.8 any Numpy Version in the past Year
Also having this issue. Tried Numpy version 1.20 and 1.16.1
The same with Python 3.7.9
in my case . Now it's working with Python 3.7.6
for me.
I fixed it by installing the package with with pip install adding the flags --no-cache-dir --no-binary :all:
Apparently this allows your wheels to re-compile with your local version of Numpy.
I honestly have no idea why this is happening, in addition to other packages I use - perhaps someone re-compiled Cython scripts with and didn't make a changelog. I'm literally shooting completely blind here though.
Reompile also worked for me. Using public cloud that messes with compilation.
Reompile also worked for me. Using public cloud that messes with compilation.
But does anyone know WHY this is actually happening? Especially on different projects as well outside of this repo?
@omarsumadi can you explain me how to do that? I put the --no-cache-dir --no-binary :all:
at the end of all my pip install
lines but it didn't worked in Python 3.7.9
.
@paulthemagno Take a look at this stack overflow post: https://stackoverflow.com/questions/40845304/runtimewarning-numpy-dtype-size-changed-may-indicate-binary-incompatibility
Realistically, the only thing you would change would be: pip install hdbscan --no-cache-dir --no-binary :all:
If that doesn't work, I'm not sure. Try not setting a version of Numpy to install and letting Pip reconcile which Numpy should be installed if you are using multiple packages that rely on Numpy. Perhaps your issue is a bit deeper.
The way to actually solve all this though is to figure out why this happened in the first place.
I use another package https://github.com/ing-bank/sparse_dot_topn with cython and numpy. And from today/yesterday, I got exactly the same error numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
.
My enviroment is aws/codebuild/amazonlinux2-x86_64-standard:3.0
. I downgraded numpy version and it doesn't work.
pip install package --no-cache-dir --no-binary :all:
fixed the problem. FYI.
@ymwdalex That's actually the same package I came to this thread for. I don't have hbdscan installed, but came to help because I was trying to solve the sparse_dot_topn package issue.
To you, do you know why this is happening? I really don't want to have another go at fixing this bug again and having no idea where to start.
We could start by asking them. Or maybe scipy (a dependcy of both) decided to re-compile it's wheels to a different version of Numpy and everything broke?
@omarsumadi thanks for the comments. I am the author of sparse_dot_topn. I didn't change the source code recently and have no idea why this happening...
@ymwdalex Ok - that is kind of funny lol! By the way, hi! I love you work and everything that you have done the library is truly one of a kind and I have not found anything that comes close to its capabilities, which is sort of why I have a vested interest in seeing this through.
I'll spill to you wat I could figure out:
Again, this kind of thing is way outside of my comfort zone (I know nothing about Cython and Numpy cross-over), but perhaps we could find the version of Numpy that was used to compile the wheels and pin that as the version for your library?
Sorry if some of this doesn't make much sense.
The same with
Python 3.7.9
in my case . Now it's working withPython 3.7.6
for me.
I eventually installed python 3.7.6 and everything worked. However, I have another machine with 3.7.9 where everything works fine. So its not related to python version I think..
@doctor3030 I'm not sure if you should close this, not until there's some better solution to other people's problems. I don't want to tell you how to do things and I most definitely respect your contributions, but I'd imagine this is definitely NOT solved especially since its pulling cross-package discussion.
I think there's a lot of cross interest figuring out what exactly happened as well. Unfortunately, I'm not well versed enough in Cython and Numpy internals to offer the correct solution other than to rebuild the wheels.
Thanks, Omar
@doctor3030 I'm not sure if you should close this, not until there's some better solution to other people's problems. I don't want to tell you how to do things and I most definitely respect your contributions, but I'd image this is definitely NOT solved especially since its pulling cross-package discussion.
I think there's a lot of cross interest figuring out what exactly happened as well.
Thanks, Omar
Ok, lets keep it open.
Here's what I can say, apparently someone says Numpy 1.20.0 (probably what Scipy is compiled in due to some change that is now impacting all of us) according to the above (https://github.com/Trusted-AI/adversarial-robustness-toolbox/pull/87).
What is most likely happening among us that is that we are using packages that limit Numpy installation version to something below 1.20.0 (such as Tensorflow).
Perhaps someone could verify the pull I linked?
I have this issue when trying to use Top2Vec on Python 3.7.9, which pulls in Tensorflow and locks me to Numpy 1.19. Rebuilding HDBScan from source in turn fails on this Accelerate error, so I think I have to rebuild NumPy from source with OpenBLAS (although NumPy is otherwise working fine), which in turn is proving difficult.
So this is still very much an issue for me, no doubt for some others too.
@cavvia the same with a similar library: BERTopic to me! I tried also with pip install package --no-cache-dir --no-binary :all:
but doesn't change anything. But in my case the problem occurs in a Python 3.7.9
while with Python 3.7.6
it works well.
I can report the same issue as @cavvia after trying to use top2vec on 3.8.0 and on 3.7.5... encountering issues with UMAP when trying to work around it...
Hello guys, we're facing the same issue here since this last weekend with no changes on the code or any library versions.
Isolating it to check what could have been happening
Dockerfile
FROM python:3.7-slim-buster
RUN apt-get update \
&& apt-get install -y --no-install-recommends python3.7-dev=3.7.3-2+deb10u2 build-essential=12.6 jq=1.5+dfsg-2+b1 curl=7.64.0-4+deb10u1 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& pip install --upgrade pip
COPY . .
RUN python -m pip install --user -r requirements.txt
CMD ["python", "-m", "test.py"]
requirements.txt
hdbscan==0.8.26
numpy==1.18.5
test.py
import hdbscan
print("hello")
outputs
$ docker run 9523faa77267 python test.py
Traceback (most recent call last):
File "test.py", line 1, in <module>
import hdbscan
File "/home/someuser/.local/lib/python3.7/site-packages/hdbscan/__init__.py", line 1, in <module>
from .hdbscan_ import HDBSCAN, hdbscan
File "/home/someuser/.local/lib/python3.7/site-packages/hdbscan/hdbscan_.py", line 21, in <module>
from ._hdbscan_linkage import (single_linkage,
File "hdbscan/_hdbscan_linkage.pyx", line 1, in init hdbscan._hdbscan_linkage
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
It works with numpy==1.20
tough.
The point is, as mentioned here before, we use tensorflow
on our project and we're locked by it on numpy<1.19
.
I'm new on the python/pypi world, but I assumed that built wheels couldn't be updated (recompiled with updated libraries/dependencies) and if a updated was needed, a new release would be drafted with a minor change.
Is there anything else we can help with? I couldn't get exactly which lib was recompiled (hdbscan or scipy?) but noticed a difference on the checksum/size for the hdbscan on different builds but not sure it's related.
# last week (when everything worked)
Created wheel for hdbscan: filename=hdbscan-0.8.26-cp37-cp37m-linux_x86_64.whl size=687506 sha256=bd8b0c65d14ffa1d804f4a3df445fc4300452968a2372d581f0bb64963a8010d
# yesterday (when the error started happening)
Created wheel for hdbscan: filename=hdbscan-0.8.26-cp37-cp37m-linux_x86_64.whl size=686485 sha256=05668339290a597a871ee90da2b50a7ca415f18b82dba59ad6c08bb9b5b9192f
@omarsumadi Thanks a lot for your investigation. I also open an issue in sparse_dot_topn package to refer this issue.
numpy 1.20.0 works for me.
In my environment which has problem, I installed numpy==1.19 first, then install sport_dot_topn, which use the latest cython and scipy (https://github.com/ing-bank/sparse_dot_topn/blob/master/setup.py#L70). Probably the latest cython or scipy has some update with incompatible with numpy version before 1.20.
Make sure that you use correct and compatible version of libs .
annoy==1.17.0 cython==0.29.21 fuzzywuzzy==0.18.0 hdbscan==0.8.26 joblib==1.0.0 kiwisolver==1.3.1 llvmlite==0.35.0 matplotlib==3.3.2 numba==0.52.0 numpy==1.20.0 pandas==1.1.2 pillow==8.1.0 pyarrow==1.0.1 python-levenshtein==0.12.1 pytz==2021.1 scikit-learn==0.24.1 scipy==1.6.0 six==1.15.0 threadpoolctl==2.1.0 tqdm==4.50.0 umap-learn==0.5.0
@omarsumadi Thanks a lot for your investigation. I also open an issue in sparse_dot_topn package to refer this issue.
numpy 1.20.0 works for me.
In my environment which has problem, I installed numpy==1.19 first, then install sport_dot_topn, which use the latest cython and scipy (https://github.com/ing-bank/sparse_dot_topn/blob/master/setup.py#L70). Probably the latest cython or scipy has some update with incompatible with numpy version before 1.20.
@ymwdalex Alternative is to (downgrade Scipy as well and keep the current Numpy version) or (install with no binary :all:). The problem is I stand to bet a lot of people are going to probably use some other Pip Package that doesn't support Numpy 1.20.0 (big hint to Tensorflow) (especially since the new version number represents a step up so many people may have < 1.20.0 in their setups.
I admit that I am as much at a loss as everyone else here. In fact I have little understanding of the binary wheel infrastructure on PyPI. I have not provided any new packages or wheels for hdbscan recently (i.e. within the last many months), so if there is a change it was handled by some automated process. Compiling from source (and, in fact, re-cythonizing everything) is likely the best option, but that does not leave a great install option. Any assistance from anyone with more experience in packaging than me would be greatly appreciated.
This was resolved for me using the following requirements: cython==0.29.21 numpy==1.20.0 scipy==1.5.4 scikit-learn==0.24.1 joblib==1.0.0 six==1.15.0
@lmcinnes - it might be due to some packages in the requirements.txt not being pinned such as numpy>=1.16.0
. it could be worth looking into pinning them in both directions >= x, <= y
such as here
@salman1993 Thanks. I agree that something like that might be good, however the difficulty is that it does work with numpy 1.20; it is in interactions with other packages that then install numpy 1.19, or similar. That means I'm not really sure what bounds to use. For now I may just restrict to numpy <= 1.19
as hopefully that may fix things for the moment, but I feel like that is really just a temporary fix, will be unnecessarily restrictive on numpy versions in the not too distance future.
So what fixed it for me is installing with pip using --no-cache-dir --no-binary :all:. Is there any merit to doing that? Or is installing with pip forcing --no-binary not something looked upon highly?
Restricting the version doesn't help (at least I don't think) because it is the old version (non-1.20.0) that is causing the issues. It's most likely the fact that Scipy is compiled in 1.20.0 and everyone else isn't using 1.20.0 and the backwards compatibility in wheels everyone's been accustomed to broke.
Someone from Scipy (not Scikit's problem) (what everyone here has in common) needs to come and say what happened :) so we can all figure out how to proceed, but that's my guess as to what happened.
I just spent the last while trying to reproduce this, and to work out what is going astray. I don't have any firm answers, but it seems like starting from a fresh python environment as long as you pick one of numpy 1.19 or numpy 1.20 and then stick with that version for any other packages that get installed (i.e. if you have any dependencies that need numpy 1.19, start with that version, and stay with it), everything works fine. It was when I had an install of a package changed the numpy version I had installed I could get this error.
Other ways I imagine you may be able to get the error: if your pip cache has a version downloaded (and possibly built into a wheel) when you had a different numpy version then things could go astray like this. The fix for that seems to be the --no-cache-dir --no-binary :all:
option to pip.
I'm not sure I have any good answers other than managing to ensure you are building hdbscan against the version of numpy you intend to keep (and not have another package with different dependencies trample on), and to use the --no-cache-dir --no-binary :all:
to ensure you are building fresh and not using an old cached wheel or similar. I know that isn't perfect, but it is what I can say for now. Hopefully over the next week or two this will shake itself out among all the various packages and dependencies.
the day this issue started, both pip==21.01
and numpy==1.20.0
were released.
there's an issue over at pypa/pip#9542 that suggests that pip might be resolving things weirdly, now that a new version of numpy exists. some weirdness like pip may be detecting a different numpy for dependency resolution than the version you have pinned, causing it to select binaries for other packages that were compiled against the new numpy instead of your pinned version.
fwiw I have a lock file with a pinned numpy==1.19.2 and hdbscan==0.8.26 built in a clean docker image. it worked fine, and now no longer builds since this issue started. even with no caches and locked versions of pip, numpy, etc – which all did build last week. pinning to pip==21.0 also doesn't fix the issue.
it seems like the combination of pip – at least >=21.0, maybe others – and the existence of numpy==1.20.0 is the cause. which may be why --no-binary
is a possible fix (as iiuc that causes everything to be recompiled from scratch, instead of using mismatched binaries). possibly pinning to numpy==1.20.0 might also work for now where that's an option
Thanks @adilosa , it does look like pypa/pip#9542 may be very relevant here. It also explains why things vary a lot from user to user with respect to what solves the problem -- depending on what other packages are installed, and whether they have pyproject.toml files, and how that get handled by the version of pip they have, very different results can occur.
I certainly appreciate that hdbscan users are very frustrated by the difficulties right now, but it seems like this is likely an upstream issue and it will be hard to resolve until pip an numpy play nicely with each other again. I wish I could do more to fix this, but I am not sure that I can. Hopefully at least some of the solutions and workarounds documented here can help people manage to get to a working installation in the meantime.
I just spent the last while trying to reproduce this, and to work out what is going astray. I don't have any firm answers, but it seems like starting from a fresh python environment as long as you pick one of numpy 1.19 or numpy 1.20 and then stick with that version for any other packages that get installed (i.e. if you have any dependencies that need numpy 1.19, start with that version, and stay with it), everything works fine. It was when I had an install of a package changed the numpy version I had installed I could get this error.
Other ways I imagine you may be able to get the error: if your pip cache has a version downloaded (and possibly built into a wheel) when you had a different numpy version then things could go astray like this. The fix for that seems to be the
--no-cache-dir --no-binary :all:
option to pip.I'm not sure I have any good answers other than managing to ensure you are building hdbscan against the version of numpy you intend to keep (and not have another package with different dependencies trample on), and to use the
--no-cache-dir --no-binary :all:
to ensure you are building fresh and not using an old cached wheel or similar. I know that isn't perfect, but it is what I can say for now. Hopefully over the next week or two this will shake itself out among all the various packages and dependencies.
I tired this its not working for me only works with numpy==1.20
I'm concerned that pypa/pip@9542 is closely related (and may be the cause of some of the users in this thread having this symptom), but may not be the entire story.
In that situation, it seems like two versions of numpy get downloaded, then modules that depend on it get built against the wrong one.
However, here:
So it's possible that for some users in this thread, the root cause is different to others.
This is not caused by, but has emerged because of, pip's introduction of a stricter dependency manager in pip 21. If you've pinned an incompatible set of versions, pip21 should now barf at you instead of proceeding.
For unpinned versions (of numpy, hdbscan and scipy), pip21 uses a smarter resolution, so you could get a different combination of packages installed than with pip20, for the same requirements (use poetry, people! Yeah, I know, it's on my long TODO list too).
For pinned versions (again of numpy, scipy and hdbscan), pip21 will install those specific versions, or barf if they're incompatible. The subdependencies of these libraries could still change compared to pip20 of course.
This suggests to me that (in the absence of broken build setups like pypa/pip@9542), where hdbscan specifies a compatible version range with other libraries (numpy, basically) in setup.py
, either:
setup.py
.To obfuscate the latter slightly, this may be functionality that numpy re-exports from one of its component/dependency libraries.
Either way, that would suggests it's possible to solve here in hdbscan - to specify either an additional dependency range (of one or more of numpy's dependencies) or tightening the numpy range.
Until then, the issue will persist until all versions of numpy in active use have dependencies compatible with hdbscan, which may happen (particularly if this is to do with older/deprecated functionality in numpy) or may not.
Upgrading to the following combination sorted the issue for me, which is encouraging in that it suggests that the incompatibility is related to older versions of numpy/numpy's dependencies and therefore might just become a less frequent occurrence over time.
With pip 21.0.1, python 3.8.7, specifications in requirements.txt, no cache dir, not building binaries... (although the wheel gets built for hdbscan)
# requirements.txt
scipy==1.5.4
numpy==1.19.5
hdbscan==0.8.26
Checking the specified numpy got installed...
>>> import numpy
>>> numpy.version.version
'1.19.5'
NB I also checked my pip install logs to make sure there aren't duplicated versions, not putting the logs here because this post is long enough already.
With pip 21.0.1, python 3.8.7, specifications in requirements.txt, no cache dir, not building binaries for numpy
(although the wheel gets built for hdbscan)
# requirements.txt
scipy==1.6.0
numpy==1.20.0
hdbscan==0.8.27
# Checking the specified numpy got installed
>>> import numpy
>> numpy.version.version
'1.20.0'
I've been struggling with this as well since last couple of days. As like some other folks here, I required both umap-learn and hdbscan both (to use BERTopic); and with regular pip and latest numpy there's some compatibility issues occurring leading to this issue, or this umap issue.
This is finally what worked for me. Inspired from multiple above posts (thanks @thclark, @lmcinnes and others). Just expanding to give a Dockerfile which folks can probably use directly.
Dockerfile
FROM python:3.8.7
# change shell to bash
SHELL ["/bin/bash", "-c"]
# Install needed libraries
RUN pip install --upgrade pip
RUN pip install --upgrade numpy umap-learn
RUN pip install --upgrade hdbscan --no-cache-dir --no-binary :all:
# Install whatever else you wanted to (example : jupyter here)
RUN pip install --upgrade jupyter scikit-learn jupyter-client jupyter-console jupyter-core jupyterlab_server jupyterlab
WORKDIR "/src/"
CMD ["jupyter-lab", "--port=8888", "--no-browser", "--ip=0.0.0.0", "--allow-root"]
Build the image via:
docker build -f Dockerfile -t clustering-jupyter:v1 .
Run via :
docker run -it -p 8888:8888 -v $PWD:/src/ --name clustering-jupyter clustering-jupyter:v1
Open localhost:8888 on your browser.
This will install numpy==1.20.0
, umap-learn==0.5.0
, hdbscan==0.8.27
; each of which is the latest version.
@theclark Thanks so much for picking through this -- I really don't have the sufficient level of expertise to tease this out as well as you have. In terms of fixing things here (which I would be keen to manage if possible), is there a good way to work out how to improve the requirements? Historically this all worked just fine with prior numpy versions, so it should be compatible with them. Any suggestions for what I should be looking at?
Wrong @TheClark I think you want @theclark. Sorry and good luck!
Err I guess the lowercase theclark is not taggable anymore. Anyways, did not want to leave you hanging.
Not quite getting how Pip is starting to become the issue at topic here. Seems as if the issue is persisting regardless of what Pip version is being used. Was wondering if someone could explain how the latest update to pip broke things.
This is not a new issue with the latest pip. It seems the issue is when building hdbscan, pip correctly selects hdbscan's choice of numpy==*
, but in isolation – without any constraint from what is pinned in the user's requirements.txt. Whenever those two versions, numpy==*
and whatever is pinned, are binary incompatible – as they became suddenly when 1.20.0 got released – this can happen on the user's next build.
There is no way for the end user to specify which version of numpy pip should use in the isolated build. This is what the discussion at pypa/pip#9542 is about. As well, HDBSCAN is probably correct in specifying "*" here, because the idea is to build against the user's choice of numpy. Specifying a version here would force all HDBSCAN users to use a specific numpy, or compatible.
Best I can tell, this is not a new behavior of pip. In fact, reverting to older versions – even versions before the new dependency resolver – has no impact on this. Probably, pip has always been doing this and the only reason it's breaking now is because numpy==1.20.0
is a breaking ABI change. This also means that there's nothing to "roll back", because the mere existence of a new numpy is enough to cause it.
This is also why pinning numpy==1.20.0
will fix it. The installed numpy will be the same as hdbscan builds against. Although, this is only by coincidence for as long as numpy==*
continues to be binary compatible. If numpy==1.21.0
got released tomorrow with breaking changes, this would happen again on the next build.
[Workarounds]
fwiw, I don't see what hdbscan could set here to fix this for everyone. hdbscan is setting its build dependency correctly and user's are likely correctly pinning their desired version of numpy. the root issue is even with both parties doing the right thing, the way pip resolves this case creates this issue, and there's no config to tell pip to do otherwise.
@adilosa @lmcinnes if there ever was to be a selected answer for this issue, this would be it. Thanks for this - extremely helpful. You have my respect sir/ma'am.
@adilosa @lmcinnes if there ever was to be a selected answer for this issue, this would be it. Thanks for this - extremely helpful. You have my respect sir/ma'am.
Will second this! Sharing what worked for me in case it can help someone else:
--no-build-isolation
--no-binary
alone was not able to solve the issueSee below for my requirements.txt
and relevant Dockerfile section:
# requirements.txt
tensorflow==1.15.2
numpy==1.18.1
scikit-learn==0.22.1
# Dockerfile
RUN python -m pip install --upgrade pip setuptools
ADD requirements.txt .
RUN pip install -r ./requirements.txt --no-cache-dir
RUN pip install hdbscan --no-cache-dir --no-binary :all: --no-build-isolation
Hi all, this solution working for me:
!pip install numpy==1.20.0
numpy==1.20.1
has been released. Does anyone tried if this fixes the issue?
I had the same issue with ConfigSpace and now hdbscan. For ConfigSpace I ended up replacing numpy with oldest-supported-numpy in the pyproject.toml and rebuilding it myself. See this issue
sklearn recently did the same in their main branch https://github.com/scikit-learn/scikit-learn/blob/main/pyproject.toml I will try with hdbscan and report back.
Using oldest-supported-numpy
worked for me. Here's a pr for it https://github.com/scikit-learn-contrib/hdbscan/pull/458
@adilosa @danielkovtun same for me, the --no-binary
alone did not helped me, the --no-build-isolation
made it! I have numpy==1.19.2
(need this version because of TensorFlow..), pip 20.3.3
, python 3.8.0
and many more other packages...
Not sure if this helps, but I had this problem in our project and managed to fix it by deleting generated cpp and pyd files and rebuilding them (there were pyx files using numpy).
When I try to import hdbscan I get following error:
`--------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 1 from sklearn.decomposition import PCA 2 import umap ----> 3 import hdbscan 4 from hyperopt import fmin, tpe, atpe, rand, hp, STATUS_OK, Trials, SparkTrials 5 import pickle
c:\program files\python37\lib\site-packages\hdbscaninit.py in ----> 1 from .hdbscan_ import HDBSCAN, hdbscan 2 from .robust_singlelinkage import RobustSingleLinkage, robust_single_linkage 3 from .validity import validity_index 4 from .prediction import approximate_predict, membership_vector, all_points_membership_vectors 5
c:\program files\python37\lib\site-packages\hdbscan\hdbscan_.py in 19 from scipy.sparse import csgraph 20 ---> 21 from ._hdbscan_linkage import (single_linkage, 22 mst_linkage_core, 23 mst_linkage_core_vector,
hdbscan_hdbscan_linkage.pyx in init hdbscan._hdbscan_linkage()
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject`
I use: python 3.7.9 numpy 1.19.3 (I also tried 1.19.5)
I would appreciate your help.
I installed hdbscan via this link and it seemed to fix the issue. I'm currently using Python 3.9.2
pip install hdbscan --no-build-isolation --no-binary :all:
worked for me!
@adilosa @lmcinnes if there ever was to be a selected answer for this issue, this would be it. Thanks for this - extremely helpful. You have my respect sir/ma'am.
Will second this! Sharing what worked for me in case it can help someone else:
- The only thing that worked for me with the version pins in my requirements.txt was to install with
--no-build-isolation
--no-binary
alone was not able to solve the issueSee below for my
requirements.txt
and relevant Dockerfile section:# requirements.txt tensorflow==1.15.2 numpy==1.18.1 scikit-learn==0.22.1
# Dockerfile RUN python -m pip install --upgrade pip setuptools ADD requirements.txt . RUN pip install -r ./requirements.txt --no-cache-dir RUN pip install hdbscan --no-cache-dir --no-binary :all: --no-build-isolation
It did work! Save my life.
When I try to import hdbscan I get following error:
`--------------------------------------------------------------------------- ValueError Traceback (most recent call last)