heroku / buildpacks-python

Heroku's Cloud Native Buildpack for Python applications.
BSD 3-Clause "New" or "Revised" License
27 stars 3 forks source link

Switch to using a virtual environment for app dependencies #253

Closed edmorley closed 1 month ago

edmorley commented 1 month ago

When using a Dockerfile, the file content contributed by different steps in the build is split into different layers, which are then combined via use of an overlay filesystem. In this model, it's possible for multiple steps of the build to write to the same directory locations - albeit at the cost of changes in earlier layers triggering cache invalidation of later layers.

With CNBs, the file content contributed by different steps in the build (whether that be from separate buildpacks, or steps within the same buildpack) are kept separate via the concept of CNB layers: https://buildpacks.io/docs/for-buildpack-authors/concepts/layer/

This provides several advantages (finer grained caching; easier multi-language images etc), however, to take full advantage of them we have to write the build content to separate layer directories.

For Python, this means we cannot simply install everything into the system site-packages directory (which lives inside the Python installation directory).

Until now, the way we've handled this is by:

However, this has a number of downsides:

  1. Some packages are broken with --user installs when using relocated Python, and otherwise require other workarounds (such as setting PYTHONHOME). eg: https://github.com/unbit/uwsgi/issues/2525
  2. Several package managers don't support the equivalent of --user installs (such as Poetry or uv), meaning when we add support for them, we would have to use a different approach for them - which would then mean app dependency environments are set up differently depending on what package manager an app uses, which doesn't seem ideal.
  3. Python and pip have to exist in the same layer, which has a number of disadvantages (see #254).

Given that PEP-405 style virtual environments (venvs) are:

...then it makes more sense to use a venv for the app dependencies instead of a user install.

Note: We can't use PYTHONPATH instead of a user site-packages install, since any directories specified via PYTHONPATH are given a higher precedence in Python's sys.path than the Python stdlib (unlike system and user site-packages, which are added to sys.path after the Python stdlib). This can then cause hard to debug issues if apps use outdated backport libraries (which can often happen unintentionally via broken/suboptimal packages in their transitive dependency tree).

GUS-W-16616226.

edmorley commented 1 month ago

One thing I forgot to add: Switching to venvs is now only possible because pip 22.3 added support for a new --python option (also usable via the PIP_PYTHON env var), which allows pip to manage an environment other than the one into which it was installed. Prior to that option existing, if we wanted to use a venv, we would have needed to install pip into that same venv as the app dependencies, meaning pip couldn't be cached (since we can't cache the app dependencies layer, given that installs without a lockfile are non-deterministic, and don't handle package removals etc).

See: