nci / scores

Metrics for the verification, evaluation and optimisation of forecasts, predictions or models.
https://scores.readthedocs.io/
Apache License 2.0
54 stars 15 forks source link

Moving to latest numpy (and dask, and other libraries generally) #642

Closed tennlee closed 2 weeks ago

tennlee commented 1 month ago

Recent versions of numpy (2.1 in particular) have started introducing breaking changes. Some of those are easy to handle for mutual compatibility, but others are more difficult. It will be important to support numpy 2.1 and onwards. Here are some options:

Option One

If we make the next release of "scores" version 2, then the version 2 series could move to supporting numpy 2.0 as default, drop support for Python 3.9 and introduce any other similar changes. It could be the "forward-facing" version.

We can continue to support 3.9 with the version 1 series, including providing feature updates as wanted, but this would increase the maintenance cost.

Option Two

The scores code can include logic switches to account for the behavioural differences between older and newer versions, to make it a seamless experience for users. This has already been done in one spot where it was fairly easy to do so. Over time, the complexity of this approach increases, but the cost will be more manageable at first. Also, over time, the risk of inconsistent or unpredictable behaviour might increase.

Option Three

The scores code can pin back to known compatible versions of libraries to maintain compatibility with old versions of things for longer. This will suit people working in provided environments that are older, but will not suit people (or package compatibility) with those working in newly-built environments or wanting to benefit from the latest changes in libraries. Over time, this can be a major issue and source of technical debt as the cost of the eventual upgrade becomes more significant.

tennlee commented 1 month ago

I will note that we are a small maintenance team. A large part of the maintenance is done by myself. Option 1 is the version that I will pursue unless others are happy to put up the pull requests required for the other options. I will proceed accordingly, but nothing is set in stone until the next release actually occurs.

tennlee commented 1 month ago

Ping @nicholasloveday, @nikeethr and @mareecarroll . See also issues #640 and #641 .

tennlee commented 1 month ago

I have found a reasonable way to handle the changed behaviour arising from the latest numpy library update. See #643 (this is akin to option two, but in this case the fix is reasonably elegant and so isn't introducing technical debt. As such, we can probably continue for a while longer to support Python 3.9 and older library versions, but it is probably still worthwhile having the discussion around what happens when that is no longer straightforward.

tennlee commented 1 month ago

As an aside, there is only a connection here between the library versions and the Python versions because the libraries are starting to drop support for Python 3.9. Eventually, older libraries will stop having maintenance and security fixes backported to Python 3.9 and at that point we will need to drop support also. We can start putting a notice in the release notes to (perhaps) bring it to people's attention.

nicholasloveday commented 4 weeks ago

In general, I support approaches that align with option 2. I suspect that most of the numpy breaking changes will have occurred in 2 and 2.1 with them becoming less frequent in the future - so hopefully this shouldn't introduce much technical debt. It still may be a little while until everyone can move to numpy 2.

tennlee commented 2 weeks ago

At the moment, the accommodations in the codebase to support older versions of things are quite reasonable. Let's carry on with option 2 and deal with each situation as it comes.