heroku / heroku-buildpack-python

Heroku's buildpack for Python applications.
https://www.heroku.com/python
MIT License
974 stars 1.84k forks source link

Is the python package cache invalidated if a sub-requirements file were changed? #1316

Closed luzfcb closed 2 years ago

luzfcb commented 2 years ago

I have the following structure:

project_root/
    Procfile
    runtime.txt
    manage.py
    requirements.txt
    requirements/
            base.txt
            test.txt
            dev.txt
            heroku.txt

The requirements.txt content is:

# All the requirements are in requirements/ directory separated by environment.
# This file only exists because Heroku search for a requirements.txt file
# to install the dependencies.
-r requirements/heroku.txt

The requirements/heroku.txt content is:

-r base.txt

gunicorn==20.1.0
whitenoise==6.0.0

Question: Is the python package cache invalidated if a sub-requirements file (heroku.txt or base.txt) were changed?

edmorley commented 2 years ago

@luzfcb Hi! No, at present changes to sub-requirements files don't result in cache invalidation, meaning things like uninstalled packages will only get cleaned up on Python version change or stack change, if the root requirements.txt is not updated itself.

luzfcb commented 2 years ago

@edmorley there is possible to update the documentation and explicitly explain when the Python package cache will be invalidated and when is not, like my question?

luzfcb commented 2 years ago

@edmorley is there any environment variable that I can use to control the cache behavior and turn off the Python package cache?

edmorley commented 2 years ago

@luzfcb Great points! :-)

In the new Python Cloud Native Buildpack that is due to replace this buildpack in the future, I'm planning on adjusting caching behaviour slightly, as well as making what's happening easier to understand (eg via build log output explaining when it reuses vs discards the cache). My preference is to make the buildpack easy enough to understand that the amount of separate documentation required is reduced.

Re turning the cache off, there isn't currently an env var for that, but the bin/pre_compile or bin/post_compile hooks could be used to manually delete/adjust the cache.

Could you explain more about the issue you are encountering?

luzfcb commented 2 years ago

Could you explain more about the issue you are encountering?

The context

I was migrating a project from Django 2.2 to 3.2 and dockerizing the project to facilitate a quick setup of the project to new team members.

My customer had forks some libraries and installed them via

-e git+https://github.com/<user>/<repository-name>.git@<hash>#egg=<python-package-name>

to facilitate the dockerization of the project, I migrated these dependencies to the compatible with PEP 440 Direct references/PEP 508 standard, so, it means migrating for something like

<python-package-name>@https://github.com/<user>/<repository-name>/archive/<hash>.tar.gz

The problem

One of the dependencies had django html templates with invalid templatetags for django 3.2. The forks maintainer sent me the new commit hash that fixes the incompatibilities.

When deploying, for some reason, the python files were updated, but the HTML files were not, that is, there were still html files with invalid templatetags which made the application break when accessing some pages that loaded these django template files. (I checked by looking directly at the files via heroku run bash)

I'm not sure if this was an error in pip/setuptools/wheel or heroku python package cache, however, I believe having the possibility of finer control of the cache, or at least more explicit documentation on how heroku python package cache invalidation works, how it doesn't for common use cases and known issues could help