heliophysicsPy / standards

3 stars 11 forks source link

PHEP 3: PyHC Python & Upstream Package Support Policy #29

Open sapols opened 5 months ago

sapols commented 5 months ago

Overview

This is the initial draft of PHEP 3, which proposes adopting a Python version & upstream package support policy for the PyHC ecosystem, inspired by SPEC 0. The goal is to standardize the support duration for Python versions and popular packages across all PyHC packages, ensuring a balance between stability and the incorporation of new features.

Specifically, this PHEP recommend that projects:

  1. Support Python versions for at least 36 months (3 years) after their initial release.
  2. Support upstream core Scientific Python packages for at least 24 months (2 years) after their initial release.
  3. Adopt support for new versions of these dependencies within 6 months of their release.

The upstream core Scientific Python packages are: numpy, scipy, matplotlib, pandas, scikit-image, networkx, scikit-learn, xarray, ipython, zarr.

This policy aims to replace the current standard #11, which mandates only Python 3 support, with a more structured timeline that supports consistent and predictable maintenance across the community.

This closes #21. This closes #20.

Renders

Rendered current text of the PHEP

Render of PHEP before scope was expanded to include upstream packages

Inspiration

This PHEP was inspired by the Python version support policies listed in:

Open questions and comments

Resolved questions and comments

jameswilburlewis commented 5 months ago

For Python versions that age out of the proposed support window -- how firm is the expectation that package maintainers will drop support for the old Python release, in the case where there are no known incompatibilities? Could that take the form of documentation stating "Recommended Python version >= 3.X, but still works under Python 3.Y as of this writing", or would you want us to take more definitive action (bump python_requires to 3.X)? For example, if someone depends on a non-PyHC package that wants an older Python release, it could be a problem for them to upgrade Python to continue using PyHC packages.

I've read some of the discussion around NEP 29, and I see the merit in the arguments about "who's going to take the plunge first and bump their package requirements?", and general community cohesion and predictability. Just wondering what the repercussions might be, in the event one of these messy real-world edge cases collides with what is otherwise sound policy.

sapols commented 5 months ago

@jameswilburlewis that's an important question I'm wrestling with myself. I know some core packages like PlasmaPy and SunPy already go as far as bumping requires-python = ">=3.10" (as is strictly suggested in NEP 29). But I'm open to feedback here.

jtniehof commented 5 months ago

Just commenting and not formal review yet, since I think we're a bit more in a "discussion" phase than wordsmithing.

As far as I can tell, PHEP 1 doesn't explicitly require the editor be distinct from the author, but I'd think it would generally be a good idea.

I'd like to suggest expanding the scope to close #21: packages probably should be able to think about Python and other dependencies in the same context even if the principles are slightly different. I appreciate trying to keep scope reasonable but these seem interconnected to me.

I really dislike the "everything not compulsory is forbidden" nature of SPEC 0. I don't think forcing our users to upgrade dependencies is a good idea. And given the difficulties with HelioCloud, we should probably err on being looser with "permitted" versions than tighter. This isn't something like Python 2 where a dedicated "kill the beast" plan was in order.

So here's the sort of thing I'd like to see:

  1. Packages must have a description of their dependency version policy, e.g. PlasmaPy, SpacePy
  2. Packages must support dependencies at least up to the timelines of SPEC 0, i.e. at the time a package version is released, it should support Python feature versions (x.y.0) released in the previous 36 months and feature versions of other dependencies released in the previous 24 months; for dependencies that do not use semantic versioning, simply versions released in the previous 24 months. (The specific numbers should be in this PHEP, with the note, as Shawn has, that it's inspired by SPEC 0).
  3. Packages may drop support immediately after those times, or may choose to continue support after, potentially in a reduced capacity.
  4. Packages that use semantic versioning should consider using their version number to indicate versions that drop support for older dependencies.
  5. There is no expectation (not even a "should") that a package "deprecate" an older dependency before dropping support for it.
  6. Packages must explicitly support (and test for) new versions of dependencies within six? twelve? months of their release. (This doesn't mean CI tests going into all eternity, just that it's been verified to work and will install).
  7. Packages which specify a maximum version number for dependencies must (terrible wording) use a carefully selected maximum, not merely specifying the current release as a maximum. (Also potentially some wording about being more aggressive about updating releases when dependencies are released?) Suggested policies include: a. Specifying the release after the current as the maximum, e.g. if numpy 1.26 is the current release, specify numpy<1.28. This should usually be reasonable if the package is clean of deprecation warnings and the dependency has a deprecation b. For dependencies using semantic versioning, specify a version that is likely to have breaking changes based on the version number, e.g. if numpy 1.26 is current, specify numpy<2. c. I'm sure people can come up with others
  8. Packages should test against release candidate versions of dependencies to facilitate support for future versions. Testing in CI is encouraged but ad-hoc testing is acceptable; testing against earlier pre-releases is also encouraged.

I can make edit suggestions to flow into Shawn's writing, but figured kicking the ideas around for a bit first would make sense. If any of these prove really controversial, we can just drop it out of the scope.

tldr: support for a reasonable about of time. Be clear to your users. Don't leave your package uninstallable.

nabobalis commented 5 months ago

I really dislike the "everything not compulsory is forbidden" nature of SPEC 0. I don't think forcing our users to upgrade dependencies is a good idea. And given the difficulties with HelioCloud, we should probably err on being looser with "permitted" versions than tighter. This isn't something like Python 2 where a dedicated "kill the beast" plan was in order.

SPEC 0 is the high level plan from the broader scientific python community, I don't see the need to be seperate from that push, we rely on all of their packages. Reducing the scope of what we need to support reduces the burden on all package maintainers within PyHC.

We should also be telling users to create separate environments for each piece of work and that way can avoid pitfalls of updates breaking or messing with their current code or environment.

Packages must explicitly support (and test for) new versions of dependencies within six? twelve? months of their release. (This doesn't mean CI tests going into all eternity, just that it's been verified to work and will install).

Typically for sunpy since we test with upstream on a cron job schedule, we don't need to worry about at least a smaller subset of package updates.

We don't test the full suite so package updates that do break, will and do slip through, so we still have to patch and release at times for those.

The main bottleneck is typically new python versions since we have a large dependency stack and we need to wait for those to explicitly support that python version but we try to push towards 3-6 months after release. Thankfully more core packages are testing sooner with python versions and their RCs so that timeframe is getting shorter.

  1. Packages which specify a maximum version number for dependencies must (terrible wording) use a carefully selected maximum, not merely specifying the current release as a maximum. (Also potentially some wording about being more aggressive about updating releases when dependencies are released?) Suggested policies include: a. Specifying the release after the current as the maximum, e.g. if numpy 1.26 is the current release, specify numpy<1.28. This should usually be reasonable if the package is clean of deprecation warnings and the dependency has a deprecation b. For dependencies using semantic versioning, specify a version that is likely to have breaking changes based on the version number, e.g. if numpy 1.26 is current, specify numpy<2. c. I'm sure people can come up with others

I am hesitant to suggest max pinning of packages unless the package it self suggests it. In the numpy case due to their massive set of changes in the coming 2.0 release, it makes sense and it's pretty common in the sphinx world due how often they can break items in a release.

But in my view, pinning either a max or a specific version should be discouraged unless you have really specific requirements in your package.

  1. Packages should test against release candidate versions of dependencies to facilitate support for future versions. Testing in CI is encouraged but ad-hoc testing is acceptable; testing against earlier pre-releases is also encouraged.

Ideally packages should add something like weekly or monthly cron job to test with "main" version of the core set of dependencies they use. Won't need to be all of them but it should at least cover the install dependencies.

I don't think that adhoc testing is good enough for this, especially with how fast the python ecosystem moves.

sapols commented 5 months ago

Thank you for the thoughtful comments, @jtniehof. And I appreciate the view from SunPy, @nabobalis! To what @jtniehof said, I definitely think it's best for PyHC's long-term success if we adopt the dependency version policy from SPEC 0. I was gonna push for it eventually, so I started questioning now whether it should be in scope for this PHEP. If people are game I'd like to include it here, but if it'll be a point of contention I'm more on the fence. I like your ideas though, especially having packages explicitly document their version policies. I plan to lead a discussion about this at Monday's telecon where hopefully I can start to get a sense of community consensus. If people seem onboard, I'd welcome and appreciate your edit suggestions. Let's see how people feel in the telecon then go from there?

UPDATE: there ended up not being time to discuss this PHEP last telecon, so we'll have that discussion next telecon in two weeks instead.

aburrell commented 5 months ago

I just want to add that there is frequently a need for our science packages to support old versions of Python. One example essential for some of my packages is that they need to work in an operational environment, and we can't make them use a modern version of Python. However, I do think it's reasonable to request users ensure their code works with the actively supported, non-beta versions of Python.

rebeccaringuette commented 5 months ago

While I happily give kudos on this for a step towards purposeful interoperability, I must throw a word of caution in here against unfunded mandates. We have a careful line to walk here between requiring things and not funding them. As such, I would not want to see dependency requirements ("must") be added to the lowest level of PyHC packages, especially the "you get listed on our webpage" level. My suggestion here is that the Python version support requirements be a requirement for the package level above "you get listed on our webpage" and all higher levels. Other dependency requirements (e.g. numpy versions) be a "should" at that level. For the next level up (2 levels above "you get listed on our webpage"), those "should" dependency requirements become required. We also need consequences spelled out for what happens when a requirement is not complied with, but perhaps that should go in the PHEP on PyHC package levels and not this one.

jtniehof commented 4 months ago

@nabobalis , I totally agree we should work in the framework and timelines of SPEC 0. But requiring packages to drop support for old versions instad of allowing it is, IMO, overly prescriptive. Within minimum bounds, packages can make their own decisions about the tradeoffs of supporting old versions for their users vs. maintenance burden. As @aburrell points out, there are many environments where jumping to the latest is not always practicable, and these are often users that fund some of this work.

As far as "must" vs. "should", I think it makes sense to have some granularity above the PHEP level. So it might be "PHEP x has must a, b, c" but we don't require full compliance with PHEP x for being a core package (or a listed package, or whatever). This of course interacts with the question of exactly how we tier packages...

nabobalis commented 4 months ago

@nabobalis , I totally agree we should work in the framework and timelines of SPEC 0. But requiring packages to drop support for old versions instad of allowing it is, IMO, overly prescriptive. Within minimum bounds, packages can make their own decisions about the tradeoffs of supporting old versions for their users vs. maintenance burden. As @aburrell points out, there are many environments where jumping to the latest is not always practicable, and these are often users that fund some of this work.

I think that's totally fair, in that case we should turn this PHEP in a more relaxed version:

Try to support new Python releases within a time frame Support older versions of Python based the package maintainers needs.

If the PHEP is just that, I guess that's more informationally than a requirement/standard?

nabobalis commented 4 months ago

While I happily give kudos on this for a step towards purposeful interoperability, I must throw a word of caution in here against unfunded mandates.

While this is a great point, I would say that the maintenance of a package is almost always unfunded. This is a problem with almost any library, it requires the deadicated time of a small group of maintainers or community contributions to keep a package ticking alone.

I would personally argue if that you want to release and advertise your code/library/package that support from the authors/maintainers and them making sure the package is kept in a working (this meaning checking support for newer dependencies and Python versions, package metadata changes as the ecosystem moves etc) is the bare minimum required. This would be unfunded work normally, and I have little knowledge about what funding opportunities are available for this type of maintaince.

sapols commented 4 months ago

Okay! I just pushed a change that I believe incorporates all the feedback from the comments here, while also expanding the scope of the PHEP to include the upstream package support policy from SPEC 0. The "drop" policy language has been softened to allow packages to continue supporting older versions if they choose to. The upstream packages touched by this new policy are clearly defined. Further recommendations have been added to the "Specification" section.

If I missed something obvious please yell at me. Otherwise we're moving into more word-smithy territory now and I'd appreciate nit-picky wording comments and other such things. Also still seeking feedback about the How to Teach This section.

jameswilburlewis commented 4 months ago

I have a thought, which may or may not be in scope for this PHEP... The theme here seems to be promoting interoperability by setting expectations on how package maintainers will deal with upstream dependencies, specifically Python itself, and PyHC-core or scientific-python-core packages. But there are other considerations that come into play, especially for packages that supply binary wheels: OS versions and CPU architectures. New OS releases and new CPU architectures (e.g. Mac Intel -> Apple Silicon M1, M2, M3 etc) can both trigger a need to recompile non-Python library code.

Should we have a policy on a timeline for supporting new OS releases or new CPU architectures with compatible wheels for PyHC packages? If I'm going to introduce a dependency on some other PyHC package for the sake of having a common way to handle coordinate systems, times, units, etc., I would hate to have a situation where installing my package on the hot new platform requires end users to compile their own C (or, God forbid, FORTRAN) libraries because binary wheels aren't available yet...

jtniehof commented 4 months ago

Good point, @jameswilburlewis. We don't have any explicit statement requiring binary wheels right now and that feels out of scope for this discussion, but where we're talking about supporting new versions of Python in a timely manner, that seems to put issues like OS and arch in-scope.

Maybe just wording that suggests support for the new must be at the same level as for the old--so if a package never does binary wheels (or conda, say) that's "okay" and users and potential downstream dependencies can make their decision, but if they've been releasing binary wheels people can reasonably rely on that in the future?

sapols commented 4 months ago

I'm tempted to say that OS/architecture support is mostly out of scope here. The crux of this PHEP is really just "PyHC is jumping on the SPEC 0 bandwagon". Plus we already have PyHC standard 4: Operating System Support: Packages must strive to support all major operating systems (e.g., OS X, Linux, Windows).

@jameswilburlewis @jtniehof Would it be sufficient to simply add a sentence like "Additionally, if a package has been releasing binary wheels, this support should continue for new OS versions and CPU architectures to maintain the same level of support as for previous environments."?

jameswilburlewis commented 4 months ago

@sapols Sounds good to me!

jtniehof commented 4 months ago

@jameswilburlewis @jtniehof Would it be sufficient to simply add a sentence like "Additionally, if a package has been releasing binary wheels, this support should continue for new OS versions and CPU architectures to maintain the same level of support as for previous environments."?

I might be even less specific: "packages should support new OS versions and CPU architectures to the same level as for previous environments". So whatever you were doing before, thou shalt do now. Up to the point of reason, of course...I don't deliver installer .exes for SpacePy anymore.

sapols commented 4 months ago

Nice. With that change the paragraph becomes:

"PyHC packages should clearly document their dependency version policy (e.g., like PlasmaPy and SpacePy) and be tested against the minimum and maximum supported versions. Testing with CI against release candidates is encouraged, too, as a way to stay ahead of future releases. Packages that use semantic versioning should consider using their version number to indicate versions that drop support for older dependencies. There is no expectation that a package "deprecate" an older dependency before dropping support for it. However, there is an expectation that maximum or exact requirements (e.g., numpy<2 or matplotlib==3.5.3) be set only when absolutely necessary (and that GitHub issues be immediately created to remove such requirements). Additionally, packages should support new OS versions and CPU architectures to the same level as previous environments."

Is it clear what we mean by that without explicitly calling out binary wheels etc? I like how succinct it is, just wanna make sure it's clear too.

namurphy commented 4 months ago

Is it clear what we mean by that without explicitly calling out binary wheels etc? I like how succinct it is, just wanna make sure it's clear too.

Perhaps we could add a link to the corresponding page in the Python documentation or PyPA's packaging guide?

Cadair commented 4 months ago

Another thought from #31:

Why don't we add a line or something here strengthening the maximum or hard pin requirement to say that "you MUST not require versions of any dependency older than 24 months?" This would go a long way to removing conflicts when trying to install all compliant packages in the same env?

sapols commented 4 months ago

Okay I pushed another round of changes that capture all comments added since the last push:

I think this document is looking pretty strong now. I still want feedback on the "How to Teach This" section (basically, do people like the two ideas in there already and should I expand them, or does someone have a better idea?). Otherwise, we may be approaching a point where I could see putting this to a first vote. UPDATE: I refined the "How to Teach This" bullets so we could leave them as-is if no one objects.

jtniehof commented 2 months ago

16 has a new suggested standard (12 in there) which is worth looking at in this context.

Also, I suggest we explicitly note that this replaces standard 11:

  1. Python 3: All packages must be compatible or work towards being compatible with Python 3. Providing ongoing support for Python 2 is not recommended as the end of life for Python 2 is January 1, 2020 (see PEP 373).

Such a note probably belongs in "how to teach this" ("This section should also document any changes the PHEP makes relative to a PHEP it replaces or some other widely-used standard or reference"). I don't know if we want to use the "Replaces:" header for standards which are replaced.

sapols commented 2 months ago

We actually already explicitly note this replaces standard 11 throughout, in the Abstract, Motivation, and Specification.

rebeccaringuette commented 1 month ago

suggest change: support these packages IF they are a dependency of the package

namurphy commented 1 month ago

Related note: I raised https://github.com/astral-sh/uv/issues/7515 as a feature request to uv to automate updating of the minimum allowed versions of packages in pyproject.toml. If packages were able to add that to CI, then it would make it easier to abide by the drop policy in SPEC 0.