piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.71k stars 4.38k forks source link

Can't import gensim library - Python 3.8.5 + numpy 1.20.2 #3097

Open norbertgieruc opened 3 years ago

norbertgieruc commented 3 years ago

Hi,

I was trying to run script that I created some time ago on Python 3.6.5 and it seems to don't work anymore. I can't import the library. Upgrading numpy didn't help. Can I ask for a solution to this problem?


> from gensim import corpora
> Traceback (most recent call last):
> 
>   File "<ipython-input-5-0b009fd6379b>", line 1, in <module>
>     from gensim import corpora
> 
>   File "C:\Users\user\Anaconda3\lib\site-packages\gensim\__init__.py", line 11, in <module>
>     from gensim import parsing, corpora, matutils, interfaces, models, similarities, utils  # noqa:F401
> 
>   File "C:\Users\user\Anaconda3\lib\site-packages\gensim\corpora\__init__.py", line 6, in <module>
>     from .indexedcorpus import IndexedCorpus  # noqa:F401 must appear before the other classes
> 
>   File "C:\Users\user\Anaconda3\lib\site-packages\gensim\corpora\indexedcorpus.py", line 14, in <module>
>     from gensim import interfaces, utils
> 
>   File "C:\Users\user\Anaconda3\lib\site-packages\gensim\interfaces.py", line 19, in <module>
>     from gensim import utils, matutils
> 
>   File "C:\Users\user\Anaconda3\lib\site-packages\gensim\matutils.py", line 1024, in <module>
>     from gensim._matutils import logsumexp, mean_absolute_difference, dirichlet_expectation
> 
>   File "gensim\_matutils.pyx", line 1, in init gensim._matutils
> 
> ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
> 

Python 3.8.5 Numpy 1.20.2

Best regards, Norbert

piskvorky commented 3 years ago

Thanks for reporting. That's weird – my understanding was that upgrading numpy should help.

Related to #3095 and https://github.com/numpy/numpy/pull/16938.

@mpenkov could you have a look please? This Numpy's C API incompatibility fuckup seems critical, will hit a large percentage of Gensim users.

mpenkov commented 3 years ago

@norbertgieruc What version of numpy did you upgrade to?

Also, what O/S?

norbertgieruc commented 3 years ago

Current: Windows 10 Numpy 1.20.2

If you are asking what Numpy version were working previously I don't remember. I created that code like 1 year ago and just wanted to rerun it now and failed.

mpenkov commented 3 years ago

@piskvorky Something really strange is happening with numpy.

In this ticket, the user is still unable to import gensim even after upgrading numpy to 1.20.2, when using Python 3.8.5 on Windows. For that configuration, Appveyor built the wheels using numpy 1.20.1 (see https://ci.appveyor.com/project/piskvorky/gensim-wheels-2x1bk/build/job/7bhvjbjhcr9mivx7#L1372). If numpy version 1.20.2 is incompatible with wheels built against 1.20.1, then we're in trouble.

piskvorky commented 3 years ago

Indeed. I'll open a ticket at numpy, but can you clarify one thing for me please? How come the wheels used numpy 1.20.1, when you switched to using oldest-supported-numpy? I don't understand how that can be.

Because if we're building against the latest version, that's against the official numpy recommendation, so I guess problems are to be expected.

BTW, Numpy have a nice signpost page for new issues: https://github.com/numpy/numpy/issues/new/choose Let's see how they did it and do it for Gensim too :) Many don't read / respect our current issue template.

mpenkov commented 3 years ago

How come the wheels used numpy 1.20.1, when you switched to using oldest-supported-numpy? I don't understand how that can be.

It could be a bug in oldest-supported-numpy. Its choice of versions for Windows builds is quite strange:

mpenkov commented 3 years ago

No, wait, the problem is that Windows builds happen using Appveyor, not Travis CI, and that uses a different build mechanism. This mechanism isn't picking up oldest-supported-numpy. I'll build new wheels and do a bugfix release. That should fix the wheels for Windows users (but, as I mentioned above, I'm not sure that will fix the ImportError that OP is having).

Have non-Windows users reported similar problems?

piskvorky commented 3 years ago

Oh crap, I just submitted the Numpy ticket :)

mpenkov commented 3 years ago

I think the Numpy ticket is still valid. We built the wheel using 1.20.1, and the user is unable to use it with numpy 1.20.2 installed. Is that expected? We built with an older version than what they have, so things should work, right?

piskvorky commented 3 years ago

I think so. @norbertgieruc how did you install Gensim? Was it a standard pip install from PyPI?

Because I see some Anaconda3 in your stack trace above. I'm not sure what these guys do exactly, but we do not support non-standard 3rd party repositories (packaged and controlled by someone else).

norbertgieruc commented 3 years ago

@piskvorky Yes, it was standard pip install via anaconda prompt.

llunn commented 3 years ago

We're also experiencing this error in one of our projects that uses gensim, but I don't think your lib is the issue here. Our source of the problem is coming from a scikit-learn package: hdbscan (https://github.com/scikit-learn-contrib/hdbscan)

Problem occurs in both Windows (python 3.8.8) and macOS (python 3.8.5).

I'm not able to disclose the full stack, but the snippet that matters:

  File "C:\Users\Brent\AppData\Local\Programs\Python\Python38\lib\site-packages\hdbscan\__init__.py", line 1, in <module>
    from .hdbscan_ import HDBSCAN, hdbscan
  File "C:\Users\Brent\AppData\Local\Programs\Python\Python38\lib\site-packages\hdbscan\hdbscan_.py", line 21, in <module>
    from ._hdbscan_linkage import (single_linkage,
  File "hdbscan/_hdbscan_linkage.pyx", line 1, in init hdbscan._hdbscan_linkage
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

In our particular case, we're also using tensorflow 2.4.1, which fails when using numpy 1.20.x, so we seem to be in a difficult spot atm. Dependencies are fun, right?

piskvorky commented 3 years ago

@llunn it's possible the issue with HDBScan is a separate one. Do they use oldest-numpy-version? Probably best to open a ticket at HDBScan, or chime in at https://github.com/numpy/numpy/issues/18709, so the numpy devs have a bigger picture. Whatever happened over there at numpy, it clearly wasn't communicated well enough downstream.

llunn commented 3 years ago

@piskvorky honestly I'm not sure what HDBScan does or how to check on that oldest-numpy-version. Will be amplifying your issue at numpy; intent here was to let you know it isn't specific to this project.

mpenkov commented 3 years ago

In our particular case, we're also using tensorflow 2.4.1, which fails when using numpy 1.20.x, so we seem to be in a difficult spot atm. Dependencies are fun, right?

@llunn Please mention the numpy versions for each of your environments (Windows/Mac, Python version, etc) that exhibit the problem. I'm trying to pin down the cause, seeing a matrix with what works and what doesn't would be helpful.

mpenkov commented 3 years ago

https://pypi.org/project/gensim/4.0.1/

llunn commented 3 years ago

@mpenkov Full disclosure here, our project is unlikely be a good case study to use. We have had to drop our gensim version to < 4.0.0 due to breaking changes in another dependency ( top2Vec ) that imports gensim.

Completely recognize that our problem is completely outside of your area of concern and it is almost certainly unrelated to gensim. That being said, I'm not sure any of the information below can be considered relevant, so I defer to your expertise in this project and provide it for whatever value it might have.

We tried today:

  1. In all cases, using gensim 3.8.5

    Platform Python Version NumPy Version
    Windows 3.8.8 1.20.2
    Windows 3.8.8 1.19.5
    MacOS 3.8.5 1.20.2
    MacOS 3.8.5 1.20.1
    MacOS 3.8.5 1.20.0
    MacOS 3.8.5 1.19.5

Based on the response over at NumPy, I don't find it surprising that none of these work since the hdbscan import is surely using the same numpy version in their pyc.

What follows is unrelated to the gensim project, but for awareness of how I joined this conversation:

  1. As mentioned above, top2vec uses gensim as a dependency;
  2. Top2Vec requires NumPy > 1.20, which has an import for tensorflow.
  3. Tensorflow 2.4.1 requires NumPy < 1.20. The actual requirement for tensorflow is ~1.19.2, maybe. I don't have the exact version in console history atm.
  4. We also using scikit_learn (which is where the hdbscan dependency comes from).
  5. Our particular trouble arose due to top2vec does not work with NumPy < 1.20, and our tensorflow model training code doesn't work with NumPy > 1.19.x.

Edit note I feel like it is worth pointing the incompatibility that is introduced for projects that require numpy >= 1.20.0 with tensorflow. This likely for sure impacts gensim.

piskvorky commented 3 years ago

Seems fixed now with Gensim 4.0.1.

@llunn I'd recommend you urge your dependencies (top2vec etc) to upgrade using the Gensim 4 Migration Guide. Because Gensim 4 is miles ahead of 3.8 in terms of performance and memory, and also fixed a number of important bugs.

mpenkov commented 3 years ago

@norbertgieruc Can you please try gensim 4.0.1 and let us know whether the problem still persists?

0x0badc0de commented 3 years ago

gensim 4.0.1 fails on python 3.9.5 with the same error when installed with pip install --no-cache-dir -r requirements.txt. If gensim is removed from requirements.txt and installed later with pip install gensim - works as expected.

piskvorky commented 3 years ago

Hm. Maybe something to do with the fact we don't distribute Python 3.9 wheels yet?

Although I don't see how that could affect whether you use requirements.txt or not. Some disfunctional order-of-resolving-conflicting-dependencies in pip?

0x0badc0de commented 3 years ago

Did some testing. Changing packages order in requirements.txt doesn't help. Also, we don't install numpy explicitely - it's brought in by some other packages. What works: using python 3.8, installing gensim after numpy, installing numpy==1.20.3. Doesn't work: installing gensim==4.0.1 along with numpy==1.19.5 (requirement of other packages) on python 3.9.

Reproducing with docker:

Works python3.8:

$ docker run --rm -it python:3.8.10 bash
root@35265b08a8fc:/# echo -e 'gensim==4.0.1\nnumpy==1.19.5' > requirements.txt && pip install --no-cache-dir -r requirements.txt
Collecting gensim==4.0.1
  Downloading gensim-4.0.1-cp38-cp38-manylinux1_x86_64.whl (23.9 MB)
     |████████████████████████████████| 23.9 MB 12.3 MB/s 
Collecting numpy==1.19.5
  Downloading numpy-1.19.5-cp38-cp38-manylinux2010_x86_64.whl (14.9 MB)
     |████████████████████████████████| 14.9 MB 12.2 MB/s 
Collecting scipy>=0.18.1
  Downloading scipy-1.6.3-cp38-cp38-manylinux1_x86_64.whl (27.2 MB)
     |████████████████████████████████| 27.2 MB 12.2 MB/s 
Collecting smart-open>=1.8.1
  Downloading smart_open-5.0.0-py3-none-any.whl (56 kB)
     |████████████████████████████████| 56 kB 14.8 MB/s 
Installing collected packages: numpy, smart-open, scipy, gensim
Successfully installed gensim-4.0.1 numpy-1.19.5 scipy-1.6.3 smart-open-5.0.0
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
root@35265b08a8fc:/# python -c 'import gensim'
/usr/local/lib/python3.8/site-packages/gensim/similarities/__init__.py:15: UserWarning: The gensim.similarities.levenshtein submodule is disabled, because the optional Levenshtein package <https://pypi.org/project/python-Levenshtein/> is unavailable. Install Levenhstein (e.g. `pip install python-Levenshtein`) to suppress this warning.
  warnings.warn(msg)

Works python3.9, install after numpy:

$ docker run --rm -it python:3.9.5  bash
root@1db8b3eb6692:/# echo 'numpy==1.19.5' > requirements.txt && pip install --no-cache-dir -r requirements.txt
Collecting numpy==1.19.5
  Downloading numpy-1.19.5-cp39-cp39-manylinux2010_x86_64.whl (14.9 MB)
     |████████████████████████████████| 14.9 MB 10.7 MB/s 
Installing collected packages: numpy
Successfully installed numpy-1.19.5
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
root@1db8b3eb6692:/# pip install -U gensim
Collecting gensim
  Downloading gensim-4.0.1.tar.gz (23.1 MB)
     |████████████████████████████████| 23.1 MB 8.9 MB/s 
Requirement already satisfied: numpy>=1.11.3 in /usr/local/lib/python3.9/site-packages (from gensim) (1.19.5)
Collecting scipy>=0.18.1
  Downloading scipy-1.6.3-cp39-cp39-manylinux1_x86_64.whl (27.3 MB)
     |████████████████████████████████| 27.3 MB 12.3 MB/s 
Collecting smart_open>=1.8.1
  Downloading smart_open-5.0.0-py3-none-any.whl (56 kB)
     |████████████████████████████████| 56 kB 5.7 MB/s 
Building wheels for collected packages: gensim
  Building wheel for gensim (setup.py) ... done
  Created wheel for gensim: filename=gensim-4.0.1-cp39-cp39-linux_x86_64.whl size=25965364 sha256=d34b646c9dd493fb98966bb636db08dd9ad859273deb768389dcbaea8724ff02
  Stored in directory: /root/.cache/pip/wheels/20/74/75/72ec1172891bdecb4ee73fbc2c71d5a150f165b1d0c2ea04e1
Successfully built gensim
Installing collected packages: smart-open, scipy, gensim
Successfully installed gensim-4.0.1 scipy-1.6.3 smart-open-5.0.0
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
root@1db8b3eb6692:/# python -c 'import gensim'
/usr/local/lib/python3.9/site-packages/gensim/similarities/__init__.py:15: UserWarning: The gensim.similarities.levenshtein submodule is disabled, because the optional Levenshtein package <https://pypi.org/project/python-Levenshtein/> is unavailable. Install Levenhstein (e.g. `pip install python-Levenshtein`) to suppress this warning.
  warnings.warn(msg)

Works installing w/ numpy==1.20:

$ docker run --rm -it python:3.9.5  bash
root@9d2868eb2f16:/# echo -e 'gensim==4.0.1\nnumpy==1.20.3' > requirements.txt && pip install --no-cache-dir -r requirements.txt
Collecting gensim==4.0.1
  Downloading gensim-4.0.1.tar.gz (23.1 MB)
     |████████████████████████████████| 23.1 MB 10.6 MB/s 
Collecting numpy==1.20.3
  Downloading numpy-1.20.3-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.4 MB)
     |████████████████████████████████| 15.4 MB 10.6 MB/s 
Collecting scipy>=0.18.1
  Downloading scipy-1.6.3-cp39-cp39-manylinux1_x86_64.whl (27.3 MB)
     |████████████████████████████████| 27.3 MB 12.2 MB/s 
Collecting smart_open>=1.8.1
  Downloading smart_open-5.0.0-py3-none-any.whl (56 kB)
     |████████████████████████████████| 56 kB 14.6 MB/s 
Building wheels for collected packages: gensim
  Building wheel for gensim (setup.py) ... done
  Created wheel for gensim: filename=gensim-4.0.1-cp39-cp39-linux_x86_64.whl size=25965974 sha256=ceb359021712ac9b24fdea51789e6665cfe250d6f17597dadcf3a6c5aa898ba3
  Stored in directory: /tmp/pip-ephem-wheel-cache-k3scqwxn/wheels/20/74/75/72ec1172891bdecb4ee73fbc2c71d5a150f165b1d0c2ea04e1
Successfully built gensim
Installing collected packages: numpy, smart-open, scipy, gensim
Successfully installed gensim-4.0.1 numpy-1.20.3 scipy-1.6.3 smart-open-5.0.0
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
root@9d2868eb2f16:/# python -c 'import gensim'
/usr/local/lib/python3.9/site-packages/gensim/similarities/__init__.py:15: UserWarning: The gensim.similarities.levenshtein submodule is disabled, because the optional Levenshtein package <https://pypi.org/project/python-Levenshtein/> is unavailable. Install Levenhstein (e.g. `pip install python-Levenshtein`) to suppress this warning.
  warnings.warn(msg)

Doesn't work installing with numpy==1.19 on py3.9:

$ docker run --rm -it python:3.9.5  bash
root@abd0652f5445:/# echo -e 'gensim==4.0.1\nnumpy==1.19.5' > requirements.txt && pip install --no-cache-dir -r requirements.txt
Collecting gensim==4.0.1
  Downloading gensim-4.0.1.tar.gz (23.1 MB)
     |████████████████████████████████| 23.1 MB 10.5 MB/s 
Collecting numpy==1.19.5
  Downloading numpy-1.19.5-cp39-cp39-manylinux2010_x86_64.whl (14.9 MB)
     |████████████████████████████████| 14.9 MB 10.7 MB/s 
Collecting scipy>=0.18.1
  Downloading scipy-1.6.3-cp39-cp39-manylinux1_x86_64.whl (27.3 MB)
     |████████████████████████████████| 27.3 MB 12.2 MB/s 
Collecting smart_open>=1.8.1
  Downloading smart_open-5.0.0-py3-none-any.whl (56 kB)
     |████████████████████████████████| 56 kB 15.0 MB/s 
Building wheels for collected packages: gensim
  Building wheel for gensim (setup.py) ... done
  Created wheel for gensim: filename=gensim-4.0.1-cp39-cp39-linux_x86_64.whl size=25965953 sha256=5308b0921cf7ac25008117c99e8c7fc9e5fa7d61a60d251d7d93bf61ff4d1d18
  Stored in directory: /tmp/pip-ephem-wheel-cache-yucgbfq9/wheels/20/74/75/72ec1172891bdecb4ee73fbc2c71d5a150f165b1d0c2ea04e1
Successfully built gensim
Installing collected packages: numpy, smart-open, scipy, gensim
Successfully installed gensim-4.0.1 numpy-1.19.5 scipy-1.6.3 smart-open-5.0.0
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
root@abd0652f5445:/# python -c 'import gensim'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/gensim/__init__.py", line 11, in <module>
    from gensim import parsing, corpora, matutils, interfaces, models, similarities, utils  # noqa:F401
  File "/usr/local/lib/python3.9/site-packages/gensim/corpora/__init__.py", line 6, in <module>
    from .indexedcorpus import IndexedCorpus  # noqa:F401 must appear before the other classes
  File "/usr/local/lib/python3.9/site-packages/gensim/corpora/indexedcorpus.py", line 14, in <module>
    from gensim import interfaces, utils
  File "/usr/local/lib/python3.9/site-packages/gensim/interfaces.py", line 19, in <module>
    from gensim import utils, matutils
  File "/usr/local/lib/python3.9/site-packages/gensim/matutils.py", line 1024, in <module>
    from gensim._matutils import logsumexp, mean_absolute_difference, dirichlet_expectation
  File "gensim/_matutils.pyx", line 1, in init gensim._matutils
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Installing without requirements.txt doesn't work either:

$ docker run --rm -it python:3.9.5  bash
root@15b7a523678d:/# pip install gensim==4.0.1 numpy==1.19.5
Collecting gensim==4.0.1
  Downloading gensim-4.0.1.tar.gz (23.1 MB)
     |████████████████████████████████| 23.1 MB 12.2 MB/s 
Collecting numpy==1.19.5
  Downloading numpy-1.19.5-cp39-cp39-manylinux2010_x86_64.whl (14.9 MB)
     |████████████████████████████████| 14.9 MB 10.5 MB/s 
Collecting scipy>=0.18.1
  Downloading scipy-1.6.3-cp39-cp39-manylinux1_x86_64.whl (27.3 MB)
     |████████████████████████████████| 27.3 MB 12.2 MB/s 
Collecting smart_open>=1.8.1
  Downloading smart_open-5.0.0-py3-none-any.whl (56 kB)
     |████████████████████████████████| 56 kB 5.5 MB/s 
Building wheels for collected packages: gensim
  Building wheel for gensim (setup.py) ... done
  Created wheel for gensim: filename=gensim-4.0.1-cp39-cp39-linux_x86_64.whl size=25965965 sha256=668d3d4a5353e31bece3adc74e2584a7501385727ad64ee79e90c575921516d4
  Stored in directory: /root/.cache/pip/wheels/20/74/75/72ec1172891bdecb4ee73fbc2c71d5a150f165b1d0c2ea04e1
Successfully built gensim
Installing collected packages: numpy, smart-open, scipy, gensim
Successfully installed gensim-4.0.1 numpy-1.19.5 scipy-1.6.3 smart-open-5.0.0
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
root@15b7a523678d:/# python -c 'import gensim'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/gensim/__init__.py", line 11, in <module>
    from gensim import parsing, corpora, matutils, interfaces, models, similarities, utils  # noqa:F401
  File "/usr/local/lib/python3.9/site-packages/gensim/corpora/__init__.py", line 6, in <module>
    from .indexedcorpus import IndexedCorpus  # noqa:F401 must appear before the other classes
  File "/usr/local/lib/python3.9/site-packages/gensim/corpora/indexedcorpus.py", line 14, in <module>
    from gensim import interfaces, utils
  File "/usr/local/lib/python3.9/site-packages/gensim/interfaces.py", line 19, in <module>
    from gensim import utils, matutils
  File "/usr/local/lib/python3.9/site-packages/gensim/matutils.py", line 1024, in <module>
    from gensim._matutils import logsumexp, mean_absolute_difference, dirichlet_expectation
  File "gensim/_matutils.pyx", line 1, in init gensim._matutils
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
mbecuwe commented 3 years ago

I still have a compatibility issue when running : pip install numpy==1.19.4 pip install tensorflow=2.5.0 pip install gensim==4.0.1

On Ubuntu 18.04, with Python 3.9.5 (installs made inside docker container).

I get the following exception when trying to import gensim: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject. Not sure how i can make this work, I tried downgrading several libraries but still not achieved to make it work in Ubuntu.

Edit : works on Python 3.8.10

sidravi1 commented 2 years ago

Hi all - We came across the same issue using gensim 3.8.3 and numpy 1.22.2

I removed numpy from requirements.txt (it gets installed as a dependency by other packages anyway) and let pip resolve the dependencies. That fixed the problem for us.

piskvorky commented 2 years ago

Gensim 3.x is not supported at this point – please upgrade to the latest version (4.1.2 currently). If something doesn't work there, report here. Thanks.

hknguyen20 commented 1 year ago

I had the same error on Python 3.8.8 and gensim 4.2.0 on Jupyter Notebook. I fixed by upgrading Numpy to 1.22.4 and restarting run time.

lianakoleva commented 1 year ago

I am on Python 3.10.7, gensim 4.3.1, and Numpy 1.24.1 getting this error when running import gensim.downloader as api api.info("text8")

fpt-ian commented 4 weeks ago

Hi there, I am using poetry, and installing the latest gensim forced a downgrade to numpy 1.26.4 from 2.1.2

This breaks some other things that were expecting numpy 2 (of course it's the other offending package for not properly specifying only numpy 2) but still, should gensim be officially depending on numpy 2 or not?