Closed Zitzeronion closed 4 weeks ago
ESPResSo doesn't officially support NumPy 2.0: https://github.com/espressomd/espresso/blob/4.2.2/requirements.txt#L4
The NumPy ecosystem is still migrating to version 2.0. On the SciPy roadmap, section NumPy, table "Python and NumPy version support per SciPy version" (click to unroll), only the most 2 recent versions of SciPy support NumPy 2.0. Similarly for Pint, only version 0.24.1 (released June 24th) and above support it.
I would say it is too early for us to adopt NumPy 2.0. The Ubuntu noble release is also stuck with NumPy 1.26 until April 2026, and this OS is used by a significant fraction of our user base. Users can always pip install a newer NumPy, but they have to take care of choosing a version compatible with Cython, which itself depends on specific versions of the Python header files (for the C++ compiler).
In addition, ESPResSo 4.2.2 has a lot of legacy Cython code and will never be able to use NumPy 2.0, unlike ESPResSo 4.3-dev which has almost no Cython code left and will, for some users, work with NumPy 2.0, if they happen to have the right combination of Python version and Cython version. Whether 4.3-dev truly is fully compatible with NumPy 2.0 is still an open question, because ESPResSo never adopted the NumPy ndarray API, so a lot more investigative work is needed from the ESPResSo side to figure out what works and what doesn't.
Thank you for the detailed explanation :) I raised the issue because when I follow the README.md:
python3 -m venv pymbe
source pymbe/bin/activate
python3 maintainer/configure_venv.py --espresso_path=/home/user/espresso/build # adapt path
python3 -m pip install -r requirements.txt
my pymbe venv came with numpy v2.0. Maybe it's a good idea to limit the numpy version in the README.md, say 2.0 >[Numpy](https://numpy.org/) >=1.23
. I am really sorry if this is obvious for most people.
No worries, managing Python dependencies is no easy task :-) It is true the PyMBE requirements.txt
could be improved by setting an upper limit on the NumPy range, like in ESPResSo, however this is not a silver bullet, as people could experience issues with any of the other packages. As a general rule, if you see pip install
is installing a newer NumPy or Jupyter version, your alarm bells should immediately go off, as this is probably going to break a lot of stuff!
In ESPResSo, we adopted the NEP 29 policy standard when it comes to Python dependencies, i.e. we only support Python packages if they are still compatible with at least the oldest actively maintained NumPy release. To avoid updating the requirements.txt
too frequently, we only do so in batches every 2 years, when the new Ubuntu major release is out. This process has become a lot easier since PEP 602, which synchronizes Python releases with Ubuntu releases.
@pm-blanco Should we adopt such a schedule in pyMBE?
Sounds reasonable to me. However, since we are still in a much earlier stage of development of the library than ESPResSo, it is likely that we need to update the requirements.txt
earlier than in 2 years.
Alright, I settled for Python package versions with a required minimum based on the Ubuntu 22.04 Python ecosystem, which should cover most of our users (#84). They are also in line with the existing requirements on the pint/pandas ecosystem, whose minimal required versions were published around 2022. NumPy 2.0 is now excluded.
When we reach the stage where can adopt ESPResSo 4.3-dev (most likely long after the release that sorts out the pip installability of pyMBE), we can revisit these numbers and align them with Ubuntu 24.04. This will be no issue for EasyBuild, since those numbers are actually the ones used in {chem}[foss/2023b] pyMBE v0.8.0
. We could also take this opportunity to require pint-pandas>=0.5
, which from what I can see has a bit of "leverage" in the pip dependency tree, since it requires pint>=0.21
and pandas>=2
. The new features' relevance to pyMBE don't seem immediately obvious to me, but maybe those of us with more experience with the internal data.frame have a positive opinion of the new features. My main motivation is to get the pint/pandas ecosystem past the version number with most leverage, so that we later have the leisure to increment the pint/pandas/pint-pandas/biopandas required minimal versions more finely. This can be done in collaboration with the EasyBuild team, just in case they are aware of another source of tension between these packages version numbers, or if they would find it more convenient that we align on versions they already package.
I hope this explanation was not too confusing. There is probably a well-defined term to characterize the situation where updating a package's minor release (pint-pandas
) requires updating another package's major release (pandas
), but "leverage" seemed fitting. I can clarify in the next pyMBE meeting. We also don't have to make any decision for now.
@jngrad Thank you for taking care of this issue and for the explanation, it was quite clear to me! pandas
does not natively handle too well pint
objects which we store in our pyMBE dataframe. pint-pandas
is alleviating the issue, but we still found some issues that might be solved in newer releases of the library and might be worth updating to next major release of pandas
only for that. However, I have already noticed that there are some changes in the API of pandas
in the newer releases so we need to make the transition carefully to make sure that we do not have any deprecated function pandas
call in our library.
When running peptide.py with numpy 2.0 and espresso v4.2.2 there seems to be an issue.
There is a fix which can be found on stackoverflow that basically suggests to downgrade the numpy version with
(pymbe) pip install "numpy<2"
Did not run into this problem with espresso v4.3 and numpy v2.0