pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.32k stars 636 forks source link

mechanism to control python-build-standalone "release" version #21441

Open cburroughs opened 1 month ago

cburroughs commented 1 month ago

Is your feature request related to a problem? Please describe.

The python build standalone project provides calendar versioned "releases" that are separate from the underlying CPython version. For example https://github.com/indygreg/python-build-standalone/releases/tag/20240726

One can control the version of CPython used with interpreter constraints, but there is not a way to specify the 'release'. This would be useful for:

Describe the solution you'd like

A way to specify the release. I suspect it is adequate for it to be global (that is using a differnt release for python 3.10 and 3.11 is excessively niche)

Describe alternatives you've considered

One can limit the known version,:

"'3.9.16|linux_arm64|75f3d10ae8933e17bf27e8572466ff8a1e7792f521d33acba578cc8a25d82e0b|24540128|https://github.com/indygreg/python-build-standalone/releases/download/20221220/cpython-3.9.16%2B20221220-aarch64-unknown-linux-gnu-install_only.tar.gz',"

But that's a lot of duplicate toil.

tdyas commented 2 days ago

Presently, the versions_info.json only really contains one PBS release for each Python version. Thus, a PBS release config would only ever either match the PBS release version for a specific Python in versions_info.json or not match at all.

Would it be useful to fill in more PBS releases in the versions_info.json? (and have a way to make it easy for Pants maintainers to generate those entries on a regular basis)

cburroughs commented 2 days ago

Oh, hmm, I had assumed from it's length and reference to multiple different releases that versions_info.json already contained everything. But I see that's not quite the case.

I think you are correct that versions_info.json would also need to know about more releases for this to work.

As some other prior art, here is scie-pants being explicit about both the Python "version" and python-buid-standalone "release": https://github.com/pantsbuild/scie-pants/blob/main/package/scie-pants.toml#L17

[[lift.interpreters]]
id = "cpython38"
provider = "PythonBuildStandalone"
release = "20240107"
lazy = true
version = "3.8.18"

[[lift.interpreters]]
id = "cpython39"
provider = "PythonBuildStandalone"
release = "20240107"
lazy = true
version = "3.9.18"

[[lift.interpreters]]
id = "cpython310"
provider = "PythonBuildStandalone"
release = "20240415"
lazy = true
version = "3.10.14"

[[lift.interpreters]]
id = "cpython311"
provider = "PythonBuildStandalone"
release = "20240415"
lazy = true
version = "3.11.9"
tdyas commented 2 days ago

What are you thoughts on the PBS rules querying GitHub releases directly? For example, if release = "20241016", then the rules could retrieve https://github.com/indygreg/python-build-standalone/releases/tag/20241016 and enumerate the assets there to find the PBS Pythons to use.

tdyas commented 2 days ago

Fom a software supply chain perspective, we would still want to give the user the ability to mandate a release verify with a specific expected hash.

cburroughs commented 2 days ago

What are you thoughts on the PBS rules querying GitHub releases directly?

My recollection is a little hazy, but from (ex: https://github.com/pantsbuild/scie-pants/pull/351) my understanding is that what you can do with releases without hitting the REST api (and sending your CI nodes+vpn users all into rate limit sadness) is pretty limited. It might work, but...

Although from a software supply chain perspective, we might also want to give the user the ability to mandate a release have a specific SHA as well

The current Pants norm of "mostly just make sure the hash hasn't changed" isn't great, but all the ecosystem cryptographic trust problems are hard and we are limited in what we can solve on our own. I think it would be odd of Python itself was the thing where Pants was "less secure".

tdyas commented 2 days ago

The current Pants norm of "mostly just make sure the hash hasn't changed" isn't great, but all the ecosystem cryptographic trust problems are hard and we are limited in what we can solve on our own. I think it would be odd of Python itself was the thing where Pants was "less secure".

Agreed. My thinking is that if we give the user the ability tp specify the PBS release but that PBS release is not in the embedded versions_info.json, then we also need to give the user the ability to at least set the expected hashes.

1, Maybe this points to moving versions_info.json to be something set by users in configuration?

  1. UX question: Should the configuration for PBS Pythons be a pbs_python target type?
cburroughs commented 2 days ago

hmm, so when I wrote this I said "Ensuring a (release) version greater than some bugfix is used." is a reasonable thing to express, but it isn't something that I need.

I suppose we could embed a bunch of this logic and specific releases to the version generation script so it is easy for users to generate the various:

"'3.9.16|linux_arm64|75f3d10ae8933e17bf27e8572466ff8a1e7792f521d33acba578cc8a25d82e0b|24540128|https://github.com/indygreg/python-build-standalone/releases/download/20221220/cpython-3.9.16%2B20221220-aarch64-unknown-linux-gnu-install_only.tar.gz',"

Stanzas without it being a ton of toil. I guess I'm unsure if polishing the script a bunch is that much less work or maintenance than doing it in "real" code.

tdyas commented 2 days ago

And I now see known_python_versions option as already being the configuration I am alluding to. I am not really enamored with the |-delimited UX there though.