pyMBE-dev / pyMBE

pyMBE provides tools to facilitate building up molecules with complex architectures in the Molecular Dynamics software ESPResSo. For an up-to-date API documention please check our website:
https://pymbe-dev.github.io/pyMBE/pyMBE.html
GNU General Public License v3.0
6 stars 8 forks source link

numpy v2.0 error with espresso v4.2.2 #83

Closed Zitzeronion closed 4 weeks ago

Zitzeronion commented 1 month ago

When running peptide.py with numpy 2.0 and espresso v4.2.2 there seems to be an issue.

python3 samples/peptide.py 
Traceback (most recent call last):
  File ".../samples/peptide.py", line 21, in <module>
    import espressomd
  File ".../espresso-v422/build/src/python/espressomd/__init__.py", line 21, in <module>
    from . import _init
  File "_init.pyx", line 1, in init espressomd._init
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

There is a fix which can be found on stackoverflow that basically suggests to downgrade the numpy version with

(pymbe) pip install "numpy<2"

Did not run into this problem with espresso v4.3 and numpy v2.0

jngrad commented 1 month ago

ESPResSo doesn't officially support NumPy 2.0: https://github.com/espressomd/espresso/blob/4.2.2/requirements.txt#L4

The NumPy ecosystem is still migrating to version 2.0. On the SciPy roadmap, section NumPy, table "Python and NumPy version support per SciPy version" (click to unroll), only the most 2 recent versions of SciPy support NumPy 2.0. Similarly for Pint, only version 0.24.1 (released June 24th) and above support it.

I would say it is too early for us to adopt NumPy 2.0. The Ubuntu noble release is also stuck with NumPy 1.26 until April 2026, and this OS is used by a significant fraction of our user base. Users can always pip install a newer NumPy, but they have to take care of choosing a version compatible with Cython, which itself depends on specific versions of the Python header files (for the C++ compiler).

In addition, ESPResSo 4.2.2 has a lot of legacy Cython code and will never be able to use NumPy 2.0, unlike ESPResSo 4.3-dev which has almost no Cython code left and will, for some users, work with NumPy 2.0, if they happen to have the right combination of Python version and Cython version. Whether 4.3-dev truly is fully compatible with NumPy 2.0 is still an open question, because ESPResSo never adopted the NumPy ndarray API, so a lot more investigative work is needed from the ESPResSo side to figure out what works and what doesn't.

Zitzeronion commented 1 month ago

Thank you for the detailed explanation :) I raised the issue because when I follow the README.md:

python3 -m venv pymbe
source pymbe/bin/activate
python3 maintainer/configure_venv.py --espresso_path=/home/user/espresso/build # adapt path
python3 -m pip install -r requirements.txt

my pymbe venv came with numpy v2.0. Maybe it's a good idea to limit the numpy version in the README.md, say 2.0 >[Numpy](https://numpy.org/) >=1.23. I am really sorry if this is obvious for most people.

jngrad commented 1 month ago

No worries, managing Python dependencies is no easy task :-) It is true the PyMBE requirements.txt could be improved by setting an upper limit on the NumPy range, like in ESPResSo, however this is not a silver bullet, as people could experience issues with any of the other packages. As a general rule, if you see pip install is installing a newer NumPy or Jupyter version, your alarm bells should immediately go off, as this is probably going to break a lot of stuff!

In ESPResSo, we adopted the NEP 29 policy standard when it comes to Python dependencies, i.e. we only support Python packages if they are still compatible with at least the oldest actively maintained NumPy release. To avoid updating the requirements.txt too frequently, we only do so in batches every 2 years, when the new Ubuntu major release is out. This process has become a lot easier since PEP 602, which synchronizes Python releases with Ubuntu releases.

@pm-blanco Should we adopt such a schedule in pyMBE?

pm-blanco commented 1 month ago

Sounds reasonable to me. However, since we are still in a much earlier stage of development of the library than ESPResSo, it is likely that we need to update the requirements.txt earlier than in 2 years.

jngrad commented 1 month ago

Alright, I settled for Python package versions with a required minimum based on the Ubuntu 22.04 Python ecosystem, which should cover most of our users (#84). They are also in line with the existing requirements on the pint/pandas ecosystem, whose minimal required versions were published around 2022. NumPy 2.0 is now excluded.

When we reach the stage where can adopt ESPResSo 4.3-dev (most likely long after the release that sorts out the pip installability of pyMBE), we can revisit these numbers and align them with Ubuntu 24.04. This will be no issue for EasyBuild, since those numbers are actually the ones used in {chem}[foss/2023b] pyMBE v0.8.0. We could also take this opportunity to require pint-pandas>=0.5, which from what I can see has a bit of "leverage" in the pip dependency tree, since it requires pint>=0.21 and pandas>=2. The new features' relevance to pyMBE don't seem immediately obvious to me, but maybe those of us with more experience with the internal data.frame have a positive opinion of the new features. My main motivation is to get the pint/pandas ecosystem past the version number with most leverage, so that we later have the leisure to increment the pint/pandas/pint-pandas/biopandas required minimal versions more finely. This can be done in collaboration with the EasyBuild team, just in case they are aware of another source of tension between these packages version numbers, or if they would find it more convenient that we align on versions they already package.

I hope this explanation was not too confusing. There is probably a well-defined term to characterize the situation where updating a package's minor release (pint-pandas) requires updating another package's major release (pandas), but "leverage" seemed fitting. I can clarify in the next pyMBE meeting. We also don't have to make any decision for now.

pm-blanco commented 4 weeks ago

@jngrad Thank you for taking care of this issue and for the explanation, it was quite clear to me! pandas does not natively handle too well pint objects which we store in our pyMBE dataframe. pint-pandas is alleviating the issue, but we still found some issues that might be solved in newer releases of the library and might be worth updating to next major release of pandas only for that. However, I have already noticed that there are some changes in the API of pandas in the newer releases so we need to make the transition carefully to make sure that we do not have any deprecated function pandas call in our library.