heroku / heroku-buildpack-python

Heroku's buildpack for Python applications.
https://www.heroku.com/python
MIT License
974 stars 1.84k forks source link

Enable Django collectstatic's symlink mode #1060

Open edmorley opened 4 years ago

edmorley commented 4 years ago

For Django apps, the buildpack currently runs the Django management command collectstatic automatically as part of the build. This command by default copies the static files provided by any Django packages/the local app into the staticfiles directory.

The collectstatic command has a --link option which enables the use of symlinks instead of file copies: https://docs.djangoproject.com/en/2.2/ref/contrib/staticfiles/#cmdoption-collectstatic-link

Using this option would save duplicating files within the app directory, reducing slug size and speeding up the build (since symlinking is quicker than copying).

This idea was proposed in #1033 - filing this issue to give somewhere to discuss the idea.

Things we'll need to investigate/consider:

In the meantime, anyone wanting to enable symlinks can do so by disabling the built-in collectstatic (by setting the env var DISABLE_COLLECTSTATIC on their app, per docs), then running collectstatic with custom command in a bin/post_compile script (see #1026).

illia-v commented 1 year ago

In the meantime, anyone wanting to enable symlinks can do so by disabling the built-in collectstatic (by setting the env var ` on their app, per [docs](https://devcenter.heroku.com/articles/django-assets#disabling-collectstatic)), then running collectstatic with custom command in abin/post_compile` script (see #1026).

./manage.py collectstatic --link creates absolute symlinks to a temporary build directory /tmp/build_* when it is executed in bin/post_compile. The symlinks become broken in a real dyno when the app directory becomes /app. So an additional step is needed in bin/post_compile to change the absolute links to relative to prevent the problem.

Something like this solution with find ./static -lname "$(pwd)/*" may help because links to files from external libraries should remain absolute.

edmorley commented 1 year ago

@illia-v Ah that's a good point, thank you. I hadn't tried --link yet so didn't know it used absolute symlinks instead of relative. Shame Django doesn't allow controlling this.

For our in-progress next-generation buildpacks (Cloud Native Buildpacks; upstream info: https://buildpacks.io / our Python CNB: https://github.com/heroku/buildpacks-python) Django's use of absolute symlinks wouldn't be an issue, since the build time and run time paths are the same.

edmorley commented 1 year ago

Another issue I've found, is that Django's ManifestStaticFilesStorage storage backend (on which WhiteNoise's backend is based) doesn't preserve symlinks when creating the hashed version of files. That is, staticfiles/example.css will be created as a symlink to the original asset, but staticfiles/example.019c8743b7cf.css will be a copy.

At first glance this seems like a bug or limitation of Django's ManifestStaticFilesStorage backend, however, given the target of the symlink could in theory be modified at any point (which would then mean the hash in the filename no longer matches the actual hash of the file contents), then I can see why they might do that. (In a Heroku context where the staticfiles directory is regenerated fresh each time this isn't a concern, but I suspect the upstream Django project might not accept a fix to make the hashed files symlinks too).

As such, using collectstatic --link will only help reduce the size of the "original" unhashed filename assets, and not the hashed versions (and also not the gzip/brotli compressed versions created by the WhiteNoise compression backend, since they have completely different contents).

These original filename assets already get deleted when using WhiteNoise's WHITENOISE_KEEP_ONLY_HASHED_FILES setting, which the Python getting started guide already uses: https://github.com/heroku/python-getting-started/blob/7a862828b78de73e13f87c26254be0d16eec1354/gettingstarted/settings.py#L183-L185

So whilst I'll still use --link in the upcoming Python CNB Django collectstatic implementation regardless, it means:

  1. It will actually offer minimal benefits to projects already using the best-practice WhiteNoise configuration,
  2. People using the current buildpack can still get the same size reduction benefits (even before switching to the CNB), just be enabling WHITENOISE_KEEP_ONLY_HASHED_FILES too.