jborean93 / pykrb5

Python krb5 API interface
MIT License
16 stars 7 forks source link

PEP517 Build-system dependencies should be pinned for deterministic builds from source code #15

Closed faph closed 1 year ago

faph commented 2 years ago

The migration to PEP517 is a good move. However, there are some edge cases whereby a pip installs from source code might fail.

Currently the build backend is configured like this:

[build-system]
requires = [
    "Cython >= 0.29.29, < 3.0.0",  # 0.29.29 includes fixes for Python 3.11
    "setuptools >= 61.0.0",  # Support for setuptools config in pyproject.toml
]
build-backend = "setuptools.build_meta"

This means, for example, than any new releases to setuptools could cause a pip install failure from source code when it's trying to build a wheel.

This is not only possible after a major version updates of setuptools. There are cases whereby other minor/micro changes could break the build, for example, depending on the exact Python installation.

For discussion: should the build system's dependencies be fully pinned at each release of krb5 to ensure for a given source distribution a binary distribution can always be rebuild, exactly as was tested at the point of the release?

faph commented 2 years ago

Note that we are talking about the isolated build environment, this is of course fully separate from the krb5 lib's own dependencies.

jborean93 commented 2 years ago

Are you actually seeing a breaking change here. I can definitely see putting a major version ceiling dept for things I know that could be problematic but I want to avoid managing dependencies going forward. This is especially true for things like Cython where a new release could fix bugs for future Python versions or just general issues with gcc that pop up. These 2 scenarios are actually real for Python 3.11 and gcc 3.x.y which only just happened. If you need reproducible builds then I recommend you pin the build deps to what you need or even better build a wheel specific to your platform and install that.

faph commented 2 years ago

Yes this is a real-life case, slightly obscure and in principle nothing to do with krb5 as such expected that in one of our Python environments we no longer could build krb5 from source whereas previously we could.

The trigger was the setuptools release 65.2.0, see https://github.com/pypa/setuptools/pull/3553 "Fix, again, finding headers during cross compiling".

Presumably our environment previously and accidentally worked with < 65.2.0 and we could build krb5 when doing a pip install. Then setuptools update came in and the build pipeline failed.

I guess the thing I am trying to discuss here is how we expect isolated builds to work. Prior to isolated builds we would be relying on setuptools' setup_requires spec, if at all set, and otherwise we would carefully setup a suitable build and compilation environment that we hoped would work with all packages that required compiling.

"Isolated builds" is just a beefed up version of setup_requires and we would expect these to "just work". Isolated builds fully declare their own dependencies required to faithfully compile a package from a source dist. Ideally there would be some "lock file" support for simpler management of pinned build dependencies.

I know pip install has various options to disable isolated builds etc, but that basically gets us back to square one whereby we are guessing what the build environment should look like.

Note also that in many situations krb5 will be a dependency of an app's dependencies and the app developer may not even be aware of a sdist to wheel compilation being in the mix...

Also, in the same way that wheels are build for specific versions of Python, I would also expect supported Python versions to be explicitly declared (and tested) with regards to the build system.

faph commented 2 years ago

BTW I struggled to Google for other cases whereby non-deterministic build-backends were reported as an issue. I guess isolated builds are relatively new and broken dependencies rare.

faph commented 2 years ago

For reference, the Python environment I am talking about is the Cloudfoundry Python buildpack, now fixed here (I think): https://github.com/cloudfoundry/python-buildpack/blob/master/CHANGELOG#L5

faph commented 2 years ago

Let me know what you’re thinking on this @jborean93

jborean93 commented 2 years ago

Sorry I've been on holidays so never got back round to this. I think I can be convinced to do a better pin on setuptools but not pinning Cython was done on purpose as Cython introduces newer features to fix things like problems I've just had with Python 3.11 and GCC 12 that would have required a new release. Even then pinning setuptools doesn't fully fix this problem as I could have pinned a release to a version that didn't work on your environment meaning you would either still have to had disabled build isolation or fixed the problem with setuptools. Granted that would have had to happen when you actually bumped the krb5 package version if you pinned it yourself but pinning krb5 isn't even guaranteed.

If you have an internal environment then you would be far better off building the wheel yourself once and distributing that. This brings a few benefits:

I cannot do this on PyPI unfortunately because I cannot statically link the krb5 libs without causing problems with other modules being loaded in the same process trying to link against the system krb5 libs. But that doesn't stop you from distributing it from your own wharehouse if your nodes are the same.

faph commented 1 year ago

No problem at all.

Granted that would have had to happen when you actually bumped the krb5 package version

Exactly this. That's what I am hoping to achieve that for a given fixed version of krb5 we get the exact same isolated build environment each time. If we bump krb5 version, that's our duty to test through...

Understand about pre-building our own wheels. However, in practice that may be not trivial. I don't know too much about the details on building wheels for non-pure Python packages. But given the diversity in our platforms I am not sure whether a single (Linux) wheel would work everywhere.

The main thing I am trying to discover here is, if isolated builds are meant to solve such build time dependencies, how can this be made robust. It really should do the same as building wheels offline, right?

faph commented 1 year ago

One more thought is that this is happening on a Cloudfoundry platform. Surely that's a sufficiently large platform to warrant some thought on how this is supposed to work. The whole idea which Cloudfoundry buildpacks is that app environments are build on the fly at the point of deployment.

jborean93 commented 1 year ago

Understand about pre-building our own wheels. However, in practice that may be not trivial. I don't know too much about the details on building wheels for non-pure Python packages. But given the diversity in our platforms I am not sure whether a single (Linux) wheel would work everywhere.

That's fair, I can't publish wheels for Linux on PyPI due to the policies there. I would have to statically link the krb5 lib which causes havoc if also need to load the system library or try and load another module that also loads krb5. I was unsure on your environment and if it was homogeneous enough that you could have your own custom wheel.

One more thought is that this is happening on a Cloudfoundry platform. Surely that's a sufficiently large platform to warrant some thought on how this is supposed to work. The whole idea which Cloudfoundry buildpacks is that app environments are build on the fly at the point of deployment.

I'm not aware of Cloudfoundry or have used it before but if they had an environment problem where setuptools would fail then the problem arguably doesn't lie with the libraries being used but the environment. What would have happened if I pinned setuptools to 65.2.0 for pykrb5. You would have had a build problem. I will concede the point that you would have only seen it if you bumped the pinned version rather than when setuptools has a new release.

Strictly speaking I agree with you that I should probably pin these versions to ensure that an sdist stays the same over its lifecycle but the realist in me is concerned that in the future I might get to a situation where I cannot pin to just one version and might need 2 different ones for Python 3.7 and Python 3.12. I am also concerned about the maintenance burden that might be increased if I need to manually maintain these dependencies and update over time or even deal with requests saying they can't rely on this specific version for a release and need it to be something else. Because of this I'm going to have to say that I will continue the current course I am in today. This gives me flexibility as a maintainer, flexibility as a user of the sdist for their build environments, and better compatibility with future versions or bugfixes that become available without requiring a new release of this library.

As a slight mitigation you can add a constraints file to your environment to either skip or pin versions of setuptools or any other library. Currently pip does not support -c/--constraints to be passed through to the isolated build environment setup but using the env var PIP_CONSTRAINTS does work instead. This isn't perfect but it does offer more control in your scenario to make more deterministic builds for your purposes.

I know this may not be the answer you are looking for and I do appreciate the time you've put into this issue. If this type of problem continues to appear in the future I am more than happy to look into it again as I was really on the fence about this.

faph commented 1 year ago

The constraints tip is actually quite useful. Many thanks @jborean93