Open norbertgieruc opened 3 years ago
Thanks for reporting. That's weird – my understanding was that upgrading numpy should help.
Related to #3095 and https://github.com/numpy/numpy/pull/16938.
@mpenkov could you have a look please? This Numpy's C API incompatibility fuckup seems critical, will hit a large percentage of Gensim users.
@norbertgieruc What version of numpy did you upgrade to?
Also, what O/S?
Current: Windows 10 Numpy 1.20.2
If you are asking what Numpy version were working previously I don't remember. I created that code like 1 year ago and just wanted to rerun it now and failed.
@piskvorky Something really strange is happening with numpy.
In this ticket, the user is still unable to import gensim even after upgrading numpy to 1.20.2, when using Python 3.8.5 on Windows. For that configuration, Appveyor built the wheels using numpy 1.20.1 (see https://ci.appveyor.com/project/piskvorky/gensim-wheels-2x1bk/build/job/7bhvjbjhcr9mivx7#L1372). If numpy version 1.20.2 is incompatible with wheels built against 1.20.1, then we're in trouble.
Indeed. I'll open a ticket at numpy, but can you clarify one thing for me please? How come the wheels used numpy 1.20.1, when you switched to using oldest-supported-numpy
? I don't understand how that can be.
Because if we're building against the latest version, that's against the official numpy recommendation, so I guess problems are to be expected.
BTW, Numpy have a nice signpost page for new issues: https://github.com/numpy/numpy/issues/new/choose Let's see how they did it and do it for Gensim too :) Many don't read / respect our current issue template.
How come the wheels used numpy 1.20.1, when you switched to using oldest-supported-numpy? I don't understand how that can be.
It could be a bug in oldest-supported-numpy. Its choice of versions for Windows builds is quite strange:
No, wait, the problem is that Windows builds happen using Appveyor, not Travis CI, and that uses a different build mechanism. This mechanism isn't picking up oldest-supported-numpy. I'll build new wheels and do a bugfix release. That should fix the wheels for Windows users (but, as I mentioned above, I'm not sure that will fix the ImportError that OP is having).
Have non-Windows users reported similar problems?
Oh crap, I just submitted the Numpy ticket :)
I think the Numpy ticket is still valid. We built the wheel using 1.20.1, and the user is unable to use it with numpy 1.20.2 installed. Is that expected? We built with an older version than what they have, so things should work, right?
I think so. @norbertgieruc how did you install Gensim? Was it a standard pip install
from PyPI?
Because I see some Anaconda3
in your stack trace above. I'm not sure what these guys do exactly, but we do not support non-standard 3rd party repositories (packaged and controlled by someone else).
@piskvorky Yes, it was standard pip install via anaconda prompt.
We're also experiencing this error in one of our projects that uses gensim, but I don't think your lib is the issue here. Our source of the problem is coming from a scikit-learn package: hdbscan (https://github.com/scikit-learn-contrib/hdbscan)
Problem occurs in both Windows (python 3.8.8) and macOS (python 3.8.5).
I'm not able to disclose the full stack, but the snippet that matters:
File "C:\Users\Brent\AppData\Local\Programs\Python\Python38\lib\site-packages\hdbscan\__init__.py", line 1, in <module>
from .hdbscan_ import HDBSCAN, hdbscan
File "C:\Users\Brent\AppData\Local\Programs\Python\Python38\lib\site-packages\hdbscan\hdbscan_.py", line 21, in <module>
from ._hdbscan_linkage import (single_linkage,
File "hdbscan/_hdbscan_linkage.pyx", line 1, in init hdbscan._hdbscan_linkage
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
In our particular case, we're also using tensorflow 2.4.1, which fails when using numpy 1.20.x, so we seem to be in a difficult spot atm. Dependencies are fun, right?
@llunn it's possible the issue with HDBScan is a separate one. Do they use oldest-numpy-version
? Probably best to open a ticket at HDBScan, or chime in at https://github.com/numpy/numpy/issues/18709, so the numpy devs have a bigger picture. Whatever happened over there at numpy, it clearly wasn't communicated well enough downstream.
@piskvorky honestly I'm not sure what HDBScan does or how to check on that oldest-numpy-version
. Will be amplifying your issue at numpy; intent here was to let you know it isn't specific to this project.
In our particular case, we're also using tensorflow 2.4.1, which fails when using numpy 1.20.x, so we seem to be in a difficult spot atm. Dependencies are fun, right?
@llunn Please mention the numpy versions for each of your environments (Windows/Mac, Python version, etc) that exhibit the problem. I'm trying to pin down the cause, seeing a matrix with what works and what doesn't would be helpful.
@mpenkov Full disclosure here, our project is unlikely be a good case study to use. We have had to drop our gensim version to < 4.0.0
due to breaking changes in another dependency ( top2Vec ) that imports gensim.
Completely recognize that our problem is completely outside of your area of concern and it is almost certainly unrelated to gensim. That being said, I'm not sure any of the information below can be considered relevant, so I defer to your expertise in this project and provide it for whatever value it might have.
We tried today:
Platform | Python Version | NumPy Version |
---|---|---|
Windows | 3.8.8 | 1.20.2 |
Windows | 3.8.8 | 1.19.5 |
MacOS | 3.8.5 | 1.20.2 |
MacOS | 3.8.5 | 1.20.1 |
MacOS | 3.8.5 | 1.20.0 |
MacOS | 3.8.5 | 1.19.5 |
Based on the response over at NumPy, I don't find it surprising that none of these work since the hdbscan import is surely using the same numpy version in their pyc.
What follows is unrelated to the gensim project, but for awareness of how I joined this conversation:
1.20
, which has an import for tensorflow. 1.20
. The actual requirement for tensorflow is ~1.19.2
, maybe. I don't have the exact version in console history atm.Edit note I feel like it is worth pointing the incompatibility that is introduced for projects that require numpy >= 1.20.0 with tensorflow. This likely for sure impacts gensim.
Seems fixed now with Gensim 4.0.1.
@llunn I'd recommend you urge your dependencies (top2vec etc) to upgrade using the Gensim 4 Migration Guide. Because Gensim 4 is miles ahead of 3.8 in terms of performance and memory, and also fixed a number of important bugs.
@norbertgieruc Can you please try gensim 4.0.1 and let us know whether the problem still persists?
gensim 4.0.1 fails on python 3.9.5 with the same error when installed with pip install --no-cache-dir -r requirements.txt
. If gensim is removed from requirements.txt
and installed later with pip install gensim
- works as expected.
Hm. Maybe something to do with the fact we don't distribute Python 3.9 wheels yet?
Although I don't see how that could affect whether you use requirements.txt
or not. Some disfunctional order-of-resolving-conflicting-dependencies in pip
?
Did some testing. Changing packages order in requirements.txt
doesn't help. Also, we don't install numpy
explicitely - it's brought in by some other packages. What works: using python 3.8, installing gensim after numpy, installing numpy==1.20.3. Doesn't work: installing gensim==4.0.1 along with numpy==1.19.5 (requirement of other packages) on python 3.9.
Reproducing with docker:
Works python3.8:
$ docker run --rm -it python:3.8.10 bash
root@35265b08a8fc:/# echo -e 'gensim==4.0.1\nnumpy==1.19.5' > requirements.txt && pip install --no-cache-dir -r requirements.txt
Collecting gensim==4.0.1
Downloading gensim-4.0.1-cp38-cp38-manylinux1_x86_64.whl (23.9 MB)
|████████████████████████████████| 23.9 MB 12.3 MB/s
Collecting numpy==1.19.5
Downloading numpy-1.19.5-cp38-cp38-manylinux2010_x86_64.whl (14.9 MB)
|████████████████████████████████| 14.9 MB 12.2 MB/s
Collecting scipy>=0.18.1
Downloading scipy-1.6.3-cp38-cp38-manylinux1_x86_64.whl (27.2 MB)
|████████████████████████████████| 27.2 MB 12.2 MB/s
Collecting smart-open>=1.8.1
Downloading smart_open-5.0.0-py3-none-any.whl (56 kB)
|████████████████████████████████| 56 kB 14.8 MB/s
Installing collected packages: numpy, smart-open, scipy, gensim
Successfully installed gensim-4.0.1 numpy-1.19.5 scipy-1.6.3 smart-open-5.0.0
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
root@35265b08a8fc:/# python -c 'import gensim'
/usr/local/lib/python3.8/site-packages/gensim/similarities/__init__.py:15: UserWarning: The gensim.similarities.levenshtein submodule is disabled, because the optional Levenshtein package <https://pypi.org/project/python-Levenshtein/> is unavailable. Install Levenhstein (e.g. `pip install python-Levenshtein`) to suppress this warning.
warnings.warn(msg)
Works python3.9, install after numpy:
$ docker run --rm -it python:3.9.5 bash
root@1db8b3eb6692:/# echo 'numpy==1.19.5' > requirements.txt && pip install --no-cache-dir -r requirements.txt
Collecting numpy==1.19.5
Downloading numpy-1.19.5-cp39-cp39-manylinux2010_x86_64.whl (14.9 MB)
|████████████████████████████████| 14.9 MB 10.7 MB/s
Installing collected packages: numpy
Successfully installed numpy-1.19.5
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
root@1db8b3eb6692:/# pip install -U gensim
Collecting gensim
Downloading gensim-4.0.1.tar.gz (23.1 MB)
|████████████████████████████████| 23.1 MB 8.9 MB/s
Requirement already satisfied: numpy>=1.11.3 in /usr/local/lib/python3.9/site-packages (from gensim) (1.19.5)
Collecting scipy>=0.18.1
Downloading scipy-1.6.3-cp39-cp39-manylinux1_x86_64.whl (27.3 MB)
|████████████████████████████████| 27.3 MB 12.3 MB/s
Collecting smart_open>=1.8.1
Downloading smart_open-5.0.0-py3-none-any.whl (56 kB)
|████████████████████████████████| 56 kB 5.7 MB/s
Building wheels for collected packages: gensim
Building wheel for gensim (setup.py) ... done
Created wheel for gensim: filename=gensim-4.0.1-cp39-cp39-linux_x86_64.whl size=25965364 sha256=d34b646c9dd493fb98966bb636db08dd9ad859273deb768389dcbaea8724ff02
Stored in directory: /root/.cache/pip/wheels/20/74/75/72ec1172891bdecb4ee73fbc2c71d5a150f165b1d0c2ea04e1
Successfully built gensim
Installing collected packages: smart-open, scipy, gensim
Successfully installed gensim-4.0.1 scipy-1.6.3 smart-open-5.0.0
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
root@1db8b3eb6692:/# python -c 'import gensim'
/usr/local/lib/python3.9/site-packages/gensim/similarities/__init__.py:15: UserWarning: The gensim.similarities.levenshtein submodule is disabled, because the optional Levenshtein package <https://pypi.org/project/python-Levenshtein/> is unavailable. Install Levenhstein (e.g. `pip install python-Levenshtein`) to suppress this warning.
warnings.warn(msg)
Works installing w/ numpy==1.20:
$ docker run --rm -it python:3.9.5 bash
root@9d2868eb2f16:/# echo -e 'gensim==4.0.1\nnumpy==1.20.3' > requirements.txt && pip install --no-cache-dir -r requirements.txt
Collecting gensim==4.0.1
Downloading gensim-4.0.1.tar.gz (23.1 MB)
|████████████████████████████████| 23.1 MB 10.6 MB/s
Collecting numpy==1.20.3
Downloading numpy-1.20.3-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.4 MB)
|████████████████████████████████| 15.4 MB 10.6 MB/s
Collecting scipy>=0.18.1
Downloading scipy-1.6.3-cp39-cp39-manylinux1_x86_64.whl (27.3 MB)
|████████████████████████████████| 27.3 MB 12.2 MB/s
Collecting smart_open>=1.8.1
Downloading smart_open-5.0.0-py3-none-any.whl (56 kB)
|████████████████████████████████| 56 kB 14.6 MB/s
Building wheels for collected packages: gensim
Building wheel for gensim (setup.py) ... done
Created wheel for gensim: filename=gensim-4.0.1-cp39-cp39-linux_x86_64.whl size=25965974 sha256=ceb359021712ac9b24fdea51789e6665cfe250d6f17597dadcf3a6c5aa898ba3
Stored in directory: /tmp/pip-ephem-wheel-cache-k3scqwxn/wheels/20/74/75/72ec1172891bdecb4ee73fbc2c71d5a150f165b1d0c2ea04e1
Successfully built gensim
Installing collected packages: numpy, smart-open, scipy, gensim
Successfully installed gensim-4.0.1 numpy-1.20.3 scipy-1.6.3 smart-open-5.0.0
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
root@9d2868eb2f16:/# python -c 'import gensim'
/usr/local/lib/python3.9/site-packages/gensim/similarities/__init__.py:15: UserWarning: The gensim.similarities.levenshtein submodule is disabled, because the optional Levenshtein package <https://pypi.org/project/python-Levenshtein/> is unavailable. Install Levenhstein (e.g. `pip install python-Levenshtein`) to suppress this warning.
warnings.warn(msg)
Doesn't work installing with numpy==1.19 on py3.9:
$ docker run --rm -it python:3.9.5 bash
root@abd0652f5445:/# echo -e 'gensim==4.0.1\nnumpy==1.19.5' > requirements.txt && pip install --no-cache-dir -r requirements.txt
Collecting gensim==4.0.1
Downloading gensim-4.0.1.tar.gz (23.1 MB)
|████████████████████████████████| 23.1 MB 10.5 MB/s
Collecting numpy==1.19.5
Downloading numpy-1.19.5-cp39-cp39-manylinux2010_x86_64.whl (14.9 MB)
|████████████████████████████████| 14.9 MB 10.7 MB/s
Collecting scipy>=0.18.1
Downloading scipy-1.6.3-cp39-cp39-manylinux1_x86_64.whl (27.3 MB)
|████████████████████████████████| 27.3 MB 12.2 MB/s
Collecting smart_open>=1.8.1
Downloading smart_open-5.0.0-py3-none-any.whl (56 kB)
|████████████████████████████████| 56 kB 15.0 MB/s
Building wheels for collected packages: gensim
Building wheel for gensim (setup.py) ... done
Created wheel for gensim: filename=gensim-4.0.1-cp39-cp39-linux_x86_64.whl size=25965953 sha256=5308b0921cf7ac25008117c99e8c7fc9e5fa7d61a60d251d7d93bf61ff4d1d18
Stored in directory: /tmp/pip-ephem-wheel-cache-yucgbfq9/wheels/20/74/75/72ec1172891bdecb4ee73fbc2c71d5a150f165b1d0c2ea04e1
Successfully built gensim
Installing collected packages: numpy, smart-open, scipy, gensim
Successfully installed gensim-4.0.1 numpy-1.19.5 scipy-1.6.3 smart-open-5.0.0
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
root@abd0652f5445:/# python -c 'import gensim'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python3.9/site-packages/gensim/__init__.py", line 11, in <module>
from gensim import parsing, corpora, matutils, interfaces, models, similarities, utils # noqa:F401
File "/usr/local/lib/python3.9/site-packages/gensim/corpora/__init__.py", line 6, in <module>
from .indexedcorpus import IndexedCorpus # noqa:F401 must appear before the other classes
File "/usr/local/lib/python3.9/site-packages/gensim/corpora/indexedcorpus.py", line 14, in <module>
from gensim import interfaces, utils
File "/usr/local/lib/python3.9/site-packages/gensim/interfaces.py", line 19, in <module>
from gensim import utils, matutils
File "/usr/local/lib/python3.9/site-packages/gensim/matutils.py", line 1024, in <module>
from gensim._matutils import logsumexp, mean_absolute_difference, dirichlet_expectation
File "gensim/_matutils.pyx", line 1, in init gensim._matutils
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
Installing without requirements.txt
doesn't work either:
$ docker run --rm -it python:3.9.5 bash
root@15b7a523678d:/# pip install gensim==4.0.1 numpy==1.19.5
Collecting gensim==4.0.1
Downloading gensim-4.0.1.tar.gz (23.1 MB)
|████████████████████████████████| 23.1 MB 12.2 MB/s
Collecting numpy==1.19.5
Downloading numpy-1.19.5-cp39-cp39-manylinux2010_x86_64.whl (14.9 MB)
|████████████████████████████████| 14.9 MB 10.5 MB/s
Collecting scipy>=0.18.1
Downloading scipy-1.6.3-cp39-cp39-manylinux1_x86_64.whl (27.3 MB)
|████████████████████████████████| 27.3 MB 12.2 MB/s
Collecting smart_open>=1.8.1
Downloading smart_open-5.0.0-py3-none-any.whl (56 kB)
|████████████████████████████████| 56 kB 5.5 MB/s
Building wheels for collected packages: gensim
Building wheel for gensim (setup.py) ... done
Created wheel for gensim: filename=gensim-4.0.1-cp39-cp39-linux_x86_64.whl size=25965965 sha256=668d3d4a5353e31bece3adc74e2584a7501385727ad64ee79e90c575921516d4
Stored in directory: /root/.cache/pip/wheels/20/74/75/72ec1172891bdecb4ee73fbc2c71d5a150f165b1d0c2ea04e1
Successfully built gensim
Installing collected packages: numpy, smart-open, scipy, gensim
Successfully installed gensim-4.0.1 numpy-1.19.5 scipy-1.6.3 smart-open-5.0.0
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
root@15b7a523678d:/# python -c 'import gensim'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python3.9/site-packages/gensim/__init__.py", line 11, in <module>
from gensim import parsing, corpora, matutils, interfaces, models, similarities, utils # noqa:F401
File "/usr/local/lib/python3.9/site-packages/gensim/corpora/__init__.py", line 6, in <module>
from .indexedcorpus import IndexedCorpus # noqa:F401 must appear before the other classes
File "/usr/local/lib/python3.9/site-packages/gensim/corpora/indexedcorpus.py", line 14, in <module>
from gensim import interfaces, utils
File "/usr/local/lib/python3.9/site-packages/gensim/interfaces.py", line 19, in <module>
from gensim import utils, matutils
File "/usr/local/lib/python3.9/site-packages/gensim/matutils.py", line 1024, in <module>
from gensim._matutils import logsumexp, mean_absolute_difference, dirichlet_expectation
File "gensim/_matutils.pyx", line 1, in init gensim._matutils
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
I still have a compatibility issue when running : pip install numpy==1.19.4 pip install tensorflow=2.5.0 pip install gensim==4.0.1
On Ubuntu 18.04, with Python 3.9.5 (installs made inside docker container).
I get the following exception when trying to import gensim: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject. Not sure how i can make this work, I tried downgrading several libraries but still not achieved to make it work in Ubuntu.
Edit : works on Python 3.8.10
Hi all - We came across the same issue using gensim 3.8.3 and numpy 1.22.2
I removed numpy from requirements.txt
(it gets installed as a dependency by other packages anyway) and let pip
resolve the dependencies. That fixed the problem for us.
Gensim 3.x is not supported at this point – please upgrade to the latest version (4.1.2 currently). If something doesn't work there, report here. Thanks.
I had the same error on Python 3.8.8 and gensim 4.2.0 on Jupyter Notebook. I fixed by upgrading Numpy to 1.22.4 and restarting run time.
I am on Python 3.10.7, gensim 4.3.1, and Numpy 1.24.1 getting this error when running
import gensim.downloader as api
api.info("text8")
Hi there, I am using poetry, and installing the latest gensim forced a downgrade to numpy 1.26.4 from 2.1.2
This breaks some other things that were expecting numpy 2 (of course it's the other offending package for not properly specifying only numpy 2) but still, should gensim be officially depending on numpy 2 or not?
Hi,
I was trying to run script that I created some time ago on Python 3.6.5 and it seems to don't work anymore. I can't import the library. Upgrading numpy didn't help. Can I ask for a solution to this problem?
Best regards, Norbert