holgern / pyedflib

pyedflib is a python library to read/write EDF+/BDF+ files based on EDFlib.
http://pyedflib.readthedocs.org/
BSD 3-Clause "New" or "Revised" License
214 stars 121 forks source link

NumPy 2.0.0 support #259

Closed miikkas closed 2 months ago

miikkas commented 3 months ago

A major new NumPy version 2.0.0 was released on 16th June 2024. (Release notes and migration guide)

It seems that as of 2024-07-02 the latest available pyEDFlib version 0.1.37 is not compatible with NumPy 2.0.0.

Could support for the latest NumPy be added? Thanks in advance. Also, I'd be happy to help if further information is required or if running some additional debugging would be considered useful.


Steps to reproduce

Environment:

$ lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04 LTS
Release:    24.04
Codename:   noble
$ uname -a

Linux <redacted> 6.8.0-36-generic #36-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 10 10:49:14 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

The created virtual environment edfvenv was removed with rm -rf edfvenv between each of the following attempts.

Using pyEDFlib package from PyPI Installation (doesn't crash) ``` $ python3.12 -m venv edfvenv $ source edfvenv/bin/activate $ pip install numpy==2.0.0 $ pip install pyedflib ``` Importing (crashes) ``` $ python Python 3.12.3 (main, Apr 10 2024, 05:33:47) [GCC 13.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.version.full_version '2.0.0' >>> import pyedflib ``` ``` A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'. If you are a user of the module, the easiest solution will be to downgrade to 'numpy<2' or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2. Traceback (most recent call last): File "", line 1, in File "//edfvenv/lib/python3.12/site-packages/pyedflib/__init__.py", line 9, in from ._extensions._pyedflib import * Traceback (most recent call last): File "", line 1, in File "//edfvenv/lib/python3.12/site-packages/pyedflib/__init__.py", line 9, in from ._extensions._pyedflib import * File "pyedflib/_extensions/_pyedflib.pyx", line 1, in init pyedflib._extensions._pyedflib ImportError: numpy.core.multiarray failed to import (auto-generated because you didn't call 'numpy.import_array()' after cimporting numpy; use 'numpy._import_array' to disable if you are certain you don't need it). ```
Using pyEDFlib cloned from this GitHub repository Installation (doesn't crash) ``` $ git clone https://github.com/holgern/pyedflib.git $ python3.12 -m venv edfvenv $ source edfvenv/bin/activate $ pip install numpy==2.0.0 $ pip install pyedflib/ ``` ``` Processing ./pyedflib Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: numpy>=1.9.1 in ./edfvenv/lib/python3.12/site-packages (from pyEDFlib==0.1.37) (2.0.0) Building wheels for collected packages: pyEDFlib Building wheel for pyEDFlib (pyproject.toml) ... done Created wheel for pyEDFlib: filename=pyEDFlib-0.1.37-cp312-cp312-linux_x86_64.whl size=2658315 sha256=d97838bdaf14142ab58d6cf56a96f97199fdb8c729004373763a2ee7120ae415 Stored in directory: /tmp/pip-ephem-wheel-cache-zyj5e7mw/wheels/ec/68/85/248f36407621f2ecfd82dc765dabfc6419a617faffed1ee3ad Successfully built pyEDFlib Installing collected packages: pyEDFlib Successfully installed pyEDFlib-0.1.37 ``` Importing (crashes in the same way as above) ``` $ python Python 3.12.3 (main, Apr 10 2024, 05:33:47) [GCC 13.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.version.full_version '2.0.0' >>> import pyedflib ``` ``` A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'. If you are a user of the module, the easiest solution will be to downgrade to 'numpy<2' or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2. Traceback (most recent call last): File "", line 1, in File "//edfvenv/lib/python3.12/site-packages/pyedflib/__init__.py", line 9, in from ._extensions._pyedflib import * Traceback (most recent call last): File "", line 1, in File "//edfvenv/lib/python3.12/site-packages/pyedflib/__init__.py", line 9, in from ._extensions._pyedflib import * File "pyedflib/_extensions/_pyedflib.pyx", line 1, in init pyedflib._extensions._pyedflib ImportError: numpy.core.multiarray failed to import (auto-generated because you didn't call 'numpy.import_array()' after cimporting numpy; use 'numpy._import_array' to disable if you are certain you don't need it). ```
Using pyEDFlib cloned from the repository and applying a small change Installation (doesn't crash) ``` $ git clone https://github.com/holgern/pyedflib.git ``` Add the following change (literally what was suggested in the `ImportError` above): ``` diff --git a/pyedflib/_extensions/_pyedflib.pyx b/pyedflib/_extensions/_pyedflib.pyx index 6087453..7297e01 100644 --- a/pyedflib/_extensions/_pyedflib.pyx +++ b/pyedflib/_extensions/_pyedflib.pyx @@ -25,6 +25,7 @@ from . cimport c_edf cimport cpython import numpy as np cimport numpy as np +np.import_array() from datetime import datetime, date from cpython.version cimport PY_MAJOR_VERSION include "edf.pxi" ``` ``` $ python3.12 -m venv edfvenv $ source edfvenv/bin/activate $ pip install numpy==2.0.0 $ pip install pyedflib/ ``` ``` Processing ./pyedflib Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: numpy>=1.9.1 in ./edfvenv/lib/python3.12/site-packages (from pyEDFlib==0.1.37) (2.0.0) Building wheels for collected packages: pyEDFlib Building wheel for pyEDFlib (pyproject.toml) ... done Created wheel for pyEDFlib: filename=pyEDFlib-0.1.37-cp312-cp312-linux_x86_64.whl size=2658315 sha256=d97838bdaf14142ab58d6cf56a96f97199fdb8c729004373763a2ee7120ae415 Stored in directory: /tmp/pip-ephem-wheel-cache-zyj5e7mw/wheels/ec/68/85/248f36407621f2ecfd82dc765dabfc6419a617faffed1ee3ad Successfully built pyEDFlib Installing collected packages: pyEDFlib Successfully installed pyEDFlib-0.1.37 ``` Importing (crashes in a different way) ``` $ python Python 3.12.3 (main, Apr 10 2024, 05:33:47) [GCC 13.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.version.full_version '2.0.0' >>> import pyedflib ``` ``` Traceback (most recent call last): File "", line 1, in File "//edfvenv/lib/python3.12/site-packages/pyedflib/__init__.py", line 9, in from ._extensions._pyedflib import * File "pyedflib/_extensions/_pyedflib.pyx", line 1, in init pyedflib._extensions._pyedflib ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject ```
skjerns commented 2 months ago

Currently I'm a bit tight on time, but will look into it at some point

If you have a fix that is backwards compatible with nump<2.0, making a PR would be appreciated!

miikkas commented 2 months ago

I'd give it a go otherwise but I'm a bit tight on time at the moment as well.

Do you think it would be ok to link this issue on NumPy's issue tracker, where they have a topic for ecosystem compatibility with NumPy 2.0? At the best, that might attract contributors.

skjerns commented 2 months ago

sure!

seberg commented 2 months ago

To note, the build changes should be nothing more than replacing oldest-supported-numpy with just numpy or numpy>2 (any NumPy version is OK, but NumPy 2 guarantees compatibility with all others). The caveat is that you must drop Python 3.8 and earlier, because on those you would have to keep using oldest-support-numpy (which of course you can with the magic comment if you care about it).

Once you modernize the build setup, you may find that NumPy Python change affect you, running ruff check path/to/code/ --select NPY201 --fix might find most or even all of that.

There is a decent chance that this is all that is needed, but of course there could be something more tricky (if just test suite fixups).

skjerns commented 2 months ago

Seems like switching to numpy2 will be smooth (see #260 )

I'm just not sure how to proceed with the requirement in general :-/ there are actually two numpies installed during build/install, one for compiling and one as a python requirement.

There are two cases to handle requirement:

The best solution would be to compile with the version that is already installed on the system. However, I need to look into how that can be achieved, as this is defined in the pyproject.toml, while the numpy requirement itsels if in the setup.py. The installer will always pull a fresh copy for building. I didn't setup the build pipeline, so not sure if we can adapt the numpy version that is being pulled at runtime.

Once you modernize the build setup, you may find that NumPy Python change affect you, running ruff check path/to/code/ --select NPY201 --fix might find most or even all of that.

ruff check --select NPY201                        (base) [0] 09:50:07
All checks passed!
seberg commented 2 months ago

there are actually two numpies installed during build/install, one for compiling and one as a python requirement.

This is completely fine! You just have to decide which NumPy version you want to test with. You can just always build with NumPy 2, but you may want to test with other NumPy versions (in a perfect world, the oldest and newest you support, but it doesn't matter too much).

The best solution would be to compile with the version that is already installed on the system [...] This will forcefully upgrade users numpy to >=2, which might not be optimal

You (and users) can always achieve pinning at build time by passing --no-build-isolation. (Which might be useful for testing and even wheel builds.)

I don't see a NumPy runtime dependency specification, but of course that would be something like numpy>=1.19,<2.2 and not specific >=2.0. So the NumPy that gets installed at runtime (not build time) is whatever the user has/wants and not limited. So this should be completely fine!

skjerns commented 2 months ago

any reason to use numpy<2.2 ?

Besides that all should work now :) even without dropping 3.7 and also when numpy<2 is installed already.

Will merge soon in #260 , unless there is an objection.

seberg commented 2 months ago

+.2 is a sensible future if you release regularly, because it matches the deprecation period. But if you don't release often or expect deprecations to rarely matter (probably), <3 might make sense. But I don't have strong opinions, so if fine if you don't care to put a bound.