thaler-lab / Wasserstein

Python/C++ library for computing Wasserstein distances efficiently.
https://thaler-lab.github.io/Wasserstein
Other
21 stars 8 forks source link

Wasserstein dependencies are not NumPy 2.0 compatible #22

Closed matthewfeickert closed 2 weeks ago

matthewfeickert commented 4 weeks ago

NumPy v2.0 was released on 2024-06-16. @j-s-ashley has discovered that some Wasserstein dependencies are not NumPy v2.0 compatible as attempting to run the tests results in

==================================== ERRORS ====================================
________________ ERROR collecting wasserstein/tests/test_emd.py ________________
ImportError while importing test module '/home/runner/work/Wasserstein/Wasserstein/wasserstein/tests/test_emd.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
wasserstein/tests/test_emd.py:2: in <module>
    import ot
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/ot/__init__.py:21: in <module>
    from . import lp
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/ot/lp/__init__.py:23: in <module>
    from .emd_wrap import emd_c, check_result, emd_1d_sorted
ot/lp/emd_wrap.pyx:1: in init ot.lp.emd_wrap
    ???
E   ImportError: numpy.core.multiarray failed to import (auto-generated because you didn't call 'numpy.import_array()' after cimporting numpy; use '<void>numpy._import_array' to disable if you are certain you don't need it).
------------------------------- Captured stderr --------------------------------

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

If we look at a successful run of the CI from 2024-06-15, pip list shows

# pip-list-pass.txt
coverage    7.5.3
EnergyFlow  1.3.3a0
h5py        3.11.0
iniconfig   2.0.0
numpy       1.26.4
packaging   24.1
pip         24.0
pluggy      1.5.0
POT         0.9.3
pytest      8.2.2
scipy       1.13.1
setuptools  70.0.0
six         1.16.0
Wasserstein 1.1.0
wheel       0.43.0
wurlitzer   3.1.1

and currently (2024-06-17) pip list shows in a failing CI run

# pip-list-fail.txt
coverage    7.5.3
EnergyFlow  1.3.3a0
h5py        3.11.0
iniconfig   2.0.0
numpy       2.0.0
packaging   24.1
pip         24.0
pluggy      1.5.0
POT         0.9.3
pytest      8.2.2
scipy       1.13.1
setuptools  70.0.0
six         1.16.0
Wasserstein 1.1.0
wheel       0.43.0
wurlitzer   3.1.1

where the only difference is numpy

$ diff /tmp/pip-list-pass.txt /tmp/pip-list-fail.txt 
5c5
< numpy       1.26.4
---
> numpy       2.0.0

We can see from the error message that at least pot (and maybe others) are to blame:

$ docker run --rm -ti python:3.12 /bin/bash
root@6d7d0c143b60:/# python -m pip --quiet install uv
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
root@6d7d0c143b60:/# uv venv && . .venv/bin/activate
Using Python 3.12.4 interpreter at: usr/local/bin/python3
Creating virtualenv at: .venv
(.venv) root@6d7d0c143b60:/# uv pip install pot
Resolved 3 packages in 361ms
Downloaded 3 packages in 997ms
Installed 3 packages in 14ms
 + numpy==2.0.0
 + pot==0.9.3
 + scipy==1.13.1
(.venv) root@6d7d0c143b60:/# python -c 'import ot'

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<string>", line 1, in <module>
  File "/.venv/lib/python3.12/site-packages/ot/__init__.py", line 21, in <module>
    from . import lp
  File "/.venv/lib/python3.12/site-packages/ot/lp/__init__.py", line 23, in <module>
    from .emd_wrap import emd_c, check_result, emd_1d_sorted
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/.venv/lib/python3.12/site-packages/ot/__init__.py", line 21, in <module>
    from . import lp
  File "/.venv/lib/python3.12/site-packages/ot/lp/__init__.py", line 23, in <module>
    from .emd_wrap import emd_c, check_result, emd_1d_sorted
  File "ot/lp/emd_wrap.pyx", line 1, in init ot.lp.emd_wrap
ImportError: numpy.core.multiarray failed to import (auto-generated because you didn't call 'numpy.import_array()' after cimporting numpy; use '<void>numpy._import_array' to disable if you are certain you don't need it).
(.venv) root@6d7d0c143b60:/#

So we can start there by focusing on them to get a NumPy compatible wheel up and working. In the meantime, we can put in a temporary constraint on numpy until this is fixed.

matthewfeickert commented 4 weeks ago

c.f. https://github.com/PythonOT/POT/issues/626