heroku / heroku-buildpack-python

Heroku's buildpack for Python applications.
https://www.heroku.com/python
MIT License
973 stars 1.83k forks source link

Change compression format and S3 URL for Python runtime archives #1567

Closed edmorley closed 3 months ago

edmorley commented 3 months ago

(This change has been split out of the Heroku-24 PR for easier review.)

As part of the CNB multi-architecture support work, we need to change the Python runtime archive S3 URLs to include the architecture name. In addition, for the CNB transition from "stacks" to "targets", it would be helpful to switch from stack ID references (such as heroku-22) in the URL scheme, to the distro name+version (eg ubuntu and 22.04) available to CNBs via the CNB targets feature. See: https://github.com/buildpacks/spec/blob/buildpack/0.10/buildpack.md#targets-1

Rather than duplicate the Python archives on S3 under different filenames/locations, it makes sense to migrate this buildpack to the new archive names too, so the same S3 archives can be used by both this buildpack and the CNB.

Moving to new archive names/URLs also means we can safely regenerate all existing Python versions to pick up the changes in #1566 (and changes made in the past, such as #1319, #1320, #1321 and #1322), since we won't have to worry about overwriting the old archives (which is something we've typically avoided, since it isn't compatible with the model of being able to roll back to an older buildpack version to return to prior behaviour).

Since we're changing the S3 URLs anyway, now is also a good time to make another change that would otherwise cause churn in the S3 URLs again (which affects people that pin buildpack version): Switching archive compression format from gzip to Zstandard (something that we've been wanting to do for a while).

Zstandard (aka zstd) is a much superior compression format over gzip (smaller archives and much faster decompression), and is seeing widespread adoption across multiple ecosystems (eg APT packages, Docker images, web browsers etc).

See: https://github.com/facebook/zstd https://github.com/facebook/zstd/blob/dev/programs/README.md#usage-of-command-line-interface

Our base images already have zstd installed (and for Rust for the CNB, there is the zstd crate available), so it's an easy switch.

Various compression levels were tested using zstd's benchmarking feature and in the end the highest level of compression picked, since:

  1. Unlike some other compression algorithms, zstd's decompression speed is generally not affected by the compression level.
  2. We only have to perform the compression once (when compiling Python).
  3. Even at the highest compression ratio, it only takes 20 seconds to compress the Python archives compared to the 10 minutes it takes to compile Python itself (when using PGO+LTO).

For the Ubuntu 22.04 Python 3.12.3 archive, switching from gzip to zstd (level 22, with long window mode enabled) results in a 26% reduction in compressed archive size.

GUS-W-15158299. GUS-W-15505556.

edmorley commented 3 months ago

Builds for all supported Python versions have been triggered using the GitHub CLI:

for v in 3.8.{0..19} 3.9.{0..19} 3.10.{0..14} 3.11.{0..9} 3.12.{0..3}; do
  gh workflow run build_python_runtime.yml --ref new-url-structure-and-zstd -F "python_version=${v}"
done

And can be viewed here: https://github.com/heroku/heroku-buildpack-python/actions/workflows/build_python_runtime.yml?query=branch%3Anew-url-structure-and-zstd