heroku / heroku-buildpack-python

Heroku's buildpack for Python applications.
https://www.heroku.com/python
MIT License
974 stars 1.84k forks source link

Python buildpack fails to deploy without first purging cache #1520

Closed arel closed 8 months ago

arel commented 8 months ago

The logic in the bin/compile script that renames paths from the $BUILD_DIR to /app is brittle and fails when files are cached with build directory names from prior builds.

As a consequence, when I deploy my project, the first time it succeeds. But, the second time and on, the project says it builds successfully but the website crashes because dependencies are not found.

Further, it seems that something changed between November 29 and December 5 on Heroku's end that made this issue appear for me. I am not sure what. Maybe the build directory naming changed.

Issue reproduction

Here is a minimal project that reproduces the build issue. It is definitely an issue on Heroku's end.

https://github.com/arel/debug-heroku-pipenv

Temporary workaround

For anyone else struggling with this, one workaround is to add your local packages to your PYTHONPATH. For example, I have a local version of botocore in ./vendor/botocore. So, by setting my PYTHONPATH (on the Heroku settings dashboard) to /app:/app/vendor/botocore, then it finds the local package. This is not a great solution, but it may help in a pinch.

edmorley commented 8 months ago

@arel Hi! Thank you for filing an issue.

On 2023-11-30 the version of pipenv was updated from 2023.7.23 to 2023.11.15 (plus setuptools was upgraded): https://github.com/heroku/heroku-buildpack-python/blob/main/CHANGELOG.md#v240---2023-11-30

If pipenv related behaviour has changed recently, then it's likely the new version of pipenv is the cause.

I will take a look at the repro you have provided soon (a minimal repro like that is very helpful, thank you!) - however, we are in the middle of a production change freeze until January due to the holidays, so I won't be able to make any changes to the buildpack until then (and I'm shortly going to be away myself).

To switch back to the old pipenv version in the meantime, you can use a buildpack URL of: https://github.com/heroku/heroku-buildpack-python.git#v239

See: https://devcenter.heroku.com/articles/buildpacks#buildpack-references https://devcenter.heroku.com/articles/heroku-cli-commands#heroku-buildpacks-set-buildpack

arel commented 8 months ago

Hi, @edmorley! I appreciate the fast response! That seems like a likely culprit.

I think the issue would affect any python package installed in site-packages (as egg-link, .pth, or *_finder.py) that references the build directory, since that only gets rewritten at runtime and I presume the build directory is ephemeral.

https://github.com/heroku/heroku-buildpack-python/blob/edcd18bfbf110347d80679ae53cd741387403fc8/bin/compile#L303-L310

One potential solution would be to broaden the replace-pattern to match any Heroku build directory. Or, better, you could cache a list of all prior build directories and change the line above to replace any of them that are found.

Have a nice vacation, and happy holidays!

edmorley commented 8 months ago

I've managed to track this down - it's actually a combination of a few separate issues.

First, the reason this started affecting your builds only recently, is that between Pipenv v2023.7.23 and v2023.8.19 an upstream regression was introduced, which changed the installation mode for local file = dependencies from being a standard install, to being an editable install.

That is, a dependency specifier like so:

[packages]
mypackage = {file = "packages/mypackage"}

...would previously have been installed as non-editable, whereas now it's installed as though editable = true had been specified.

Worse, it appears that even if one includes an explicit editable = false (note: false) to try and disable editable installation mode, it doesn't do anything.

I've filed this regression as: https://github.com/pypa/pipenv/issues/6054

Whilst this was still a regression, the only reason it caused issues here is that there was a pre-existing bug in the buildpack around local file = dependencies when installed in editable mode (as you noted).

Specifically, the current path rewriting handling relies on the fact that that we expect the installer to always be re-run to fix up any stale paths from the previous build.

(There's some backstory on the path rewriting in #1006 and #1252. The fact that paths change between build-time and run-time is a massive pain and thankfully going away with the next generation Cloud Native Buildpacks aka CNBs, xref CNB spec and the WIP Python CNB)

This re-running of the installer always occurs for standard Pip builds (since requirements files are non-deterministic given e.g. transitive deps, includes etc), and already occurred for Git VCS Pipenv builds (via this fragile check), however, there was no check for local path = file dependency builds.

I also found another bug unrelated to path rewriting (#1525), which makes me think that we should just never skip pipenv install as the lockfile is still not always deterministic, and instead defer to Pipenv to decide whether an environment needs updating.

One potential solution would be to broaden the replace-pattern to match any Heroku build directory.

So the problem with trying to match any build directory is that we would have to hardcode the expected build path style in the buildpack (eg via a hardcoded /tmp/build_* glob), and that path (a) is not guaranteed to stay the same over time on Heroku (in fact it's already changed once in the last couple of years), (b) could be a completely different path on non-Heroku platforms (this buildpack is used by eg Dokku and others).

Or, better, you could cache a list of all prior build directories and change the line above to replace any of them that are found.

Yeah one solution would be to:

  1. During each build, store the current build directory path in the build cache at a known location
  2. At the start of each cached build, rewrite paths in the restored-from-cache site-packages from OLD_BUILD_DIR to NEW_BUILD_DIR

However:

I've opened #1526, which resolves the issue when tested against https://github.com/arel/debug-heroku-pipenv and also adds integration tests for editable Pipenv installs (the buildpack previously only tested editable installs with Pip).

Note: I ran into setuptools related import errors on the first rebuild using the fix branch of the buildpack - clearing the build cache resolved these. I believe they are caused by the debug-heroku-pipenv repo having a very old setuptools version in its lockfile, which can cause issues depending on which order Pipenv attempts to install packages. There is possibly another upstream Pipenv bug causing this, but for now I'd recommend keeping your lockfile up to date, so the setuptools version in there is compatible with what gets pulled in via Pipenv itself.