heroku / heroku-buildpack-ruby

Heroku's buildpack for Ruby applications.
MIT License
11 stars 3 forks source link

Bundler cache (including compact index) is written to /app and into the slug #1117

Open edmorley opened 3 years ago

edmorley commented 3 years ago

I'm currently auditing official/popular buildpacks for compatibility with potentially changing the build directory to /app in the future.

One of the potential source of problems for such a move, is that files written to /app (or to$HOME, which is /app during the build) will now be included in the slug, when previously they were not. As such, I'm checking what files are left behind by buildpacks in /app, using this buildpack which lists the contents of /app at build time: https://github.com/edmorley/heroku-buildpack-list-app-dir

Testing the Ruby getting started guide with the Ruby buildpack + the above buildpack, I see that the bundler cache (~/.bundle/cache/compact_index) is being written to /app. Once the build directory is /app, this would cause the slug size to increase, potentially pushing apps closer to the limit. For the getting started guide this cache is only 18MB, but it doesn't have as many dependencies as some typical Rails apps.

It seems there are few options:

  1. If it's useful to actually keep the bundler cache (if it's not already being kept), move it to $CACHE_DIR instead
  2. If it's not useful to keep the bundler cache, then either (a) try and have it be written to a directory under /tmp instead of $HOME, or else (b) delete it from $HOME at the end of the build
schneems commented 3 years ago

Related support ticket 958280

schneems commented 3 years ago

I looked into this heavily for https://github.com/heroku/heroku-buildpack-ruby/issues/1118

The cache is only used at bundle install time. And it appears that it's only used when dependencies are not satisfied. I.e. if you deploy to heroku, then do a heroku run bash followed by bundle install it won't download the cache for "reasons". I'm assuming the reason is that it's dependencies are already satisfied. However I'm not totally sure of the behavior.

On first bundle install the cache is downloaded and written here https://github.com/rubygems/rubygems/blob/be08d8307eda3b61f0ec0460fe7fbcf647b526e6/bundler/lib/bundler/compact_index_client/updater.rb#L64

Where local_path is something in ~/.bundle/cache/compact_index. The path name includes an etag of the compact index. Before downloading a new index bundler will check to see if a prior index's etag is satisfactory.

Based on this it seems that making these files available at runtime add nothing (because people don't bundle install at runtime) so they could be stripped out before launch.

The other question is: Is it helpful to preserve these between deploys? It depends on how frequently the etag is invalidated. @hone knows more about the whole compact index so he might have some insight. My very unscientific attempt to answer this question was to deploy an app to heroku today and see if it has the same etag or not:

So it looks like there may be some benefit to keeping them around. I think it's worth benchmarking the download of the index. If it's already on us-east and coming from S3 then there's no speed benefit from putting it in the cache. For CNB where there's the local install case to think about, it's likely a good idea to cache it (even if it's fast).

This does make me vaguely wish there was some kind of cross-app cache or mechanism since it seems wasteful to duplicate this across N caches (where N is number of apps on the platform).

schneems commented 3 years ago

Etag still valid:

remote:        `/app/.bundle/cache/compact_index/rubygems.org.443.29b0360b937aa4d161703e6160654e47/versions`.
schneems commented 3 years ago

Etag still valid

remote:        `/app/.bundle/cache/compact_index/rubygems.org.443.29b0360b937aa4d161703e6160654e47/versions`.
schneems commented 1 year ago

I'm unsure if this also happens in the CNB as well. Need to investigate if this is still an issue or not