heroku / heroku-buildpack-ruby

Heroku's buildpack for Ruby applications.
MIT License
788 stars 1.87k forks source link

Save/restore yarn cache #654

Open swrobel opened 6 years ago

swrobel commented 6 years ago

Every single one of our deploys is taking ~50s to install packages via yarn. This should be near-instantaneous when yarn.lock hasn't changed. It appears that the nodejs buildpack already takes care of saving/restoring the yarn cache.

schneems commented 6 years ago

You can get caching by using the official node buildpack

$ heroku buildpacks:add heroku/nodejs -i 1
grk commented 6 years ago

This makes yarn install run twice on each deploy though.

schneems commented 6 years ago

It should be cached on the second run and nearly instantaneous. Much better than only running once but having to install from scratch.

swrobel commented 6 years ago

@schneems alas, not really, since yarn will still wipe node_modules & re-symlink everything from the cache every time it's run, it's still a 10-15s process

swrobel commented 6 years ago

Aaaaand it turns out it's even worse than I thought: https://github.com/yarnpkg/yarn/issues/932

Yarn now runs 3x with both buildpacks, building native packages each time:

  1. nodejs: dependencies + devDependencies
  2. nodejs: prune devDependencies
  3. ruby

Deploy times are now officially insane with both buildpacks

schneems commented 6 years ago

One option is to use the heroku/nodejs buildpack which does caching and all that fun stuff and then manually disable the yarn:install task in your Rakefile. I think it's something like this:

Rake::Task["yarn:install"].clear
task "yarn:install" do
  # nothing here
end
voter101 commented 6 years ago

One possible solution is to build with flag YARN_PRODUCTION set to true. That does not prevent yarn to run three times, but they don't reinstall dev dependencies all the time.

joevandyk commented 5 years ago

We are running into this as well. Deploy times for us are approaching 7 minutes.

luccasmaso commented 5 years ago

I'm using both buildpacks too with my rails app + webpacker. It tuns out that every deploy yarn install is executed twice and does not make use of cache, resulting in nearly 3-5 minutes deploy. I've tried Rake::Task["yarn:install"].clear but no success. I'm kind of stuck here. Thanks

ericboehs commented 5 years ago

This seems like something the community should care about. Yarn caching in the nodejs buildpack has supposedly been around for 2+ years. Our yarn install takes 50+ seconds each deploy. That's  25% of build time when our webpacker cache is used.

I'd like to take a look at this soon and see what can be done.

joevandyk commented 5 years ago

@ericboehs that would be awesome!

ericboehs commented 5 years ago

Schneems suggestions seem to be helpful. Sort of.

Here's how you do it:

Yarn will run once and will reuse caches so that install time is almost instant (a few seconds). Unfortunately the setup/teardown time for the nodejs buildpack doesn't gain you much.

The above changes saved 7 seconds on my build time compared to running yarn install from scratch every build but without the nodejs buildpack. My from-scratch yarn install time is 51 seconds. If yours is higher than this, then you may see a bigger savings than 7 seconds.

For now, I'm leaving my deploy process as is. For me, I can't justify the added complexity for 7 seconds.

It would seem caching node_modules/yarn within the ruby buildpack would save the most time. In our app's case, I think this would save another 10-20 seconds, if not more. This would be very welcomed but isn't high priority for our company. I hope someone else in the community (or within Heroku) is able to implement this.

ericboehs commented 5 years ago

I have done some primitive caching of the node_modules here: https://github.com/ericboehs/heroku-buildpack-ruby-yarn-cache/commit/f851a9560ff7363818b530bbe14490da5c51d5f6 (this isn't production ready as it will need cache pruning).

~It seems to be almost identical in execution time to adding the nodejs buildpack.~ (Edit: See my next comment below.) The caching/restoration of node_modules just takes a really long time.

If our yarn install time gets over 60 seconds, I'll probably end up adding the nodejs buildpack (but not clearing the yarn:install task as it only saved me a couple seconds). Until then, I'll continue to use the official ruby buildpack.


Unrelated but useful for timing deploys:

unbuffer git push staging | ts -s "%H:%M:%.S"

This will prepend each line of output with a timestamp for when the line was buffered. Great for all kinds of commands, not just deploys. You'll need the unbuffer binary which is available for macOS via brew install expect.

ericboehs commented 5 years ago

Hmmm. After testing my caching buildpack a few more times, it seems it does indeed save 10-20 seconds (my last run saved 19 seconds) compared to running with the nodejs buildpack.

Could someone else try out my buildpack and see if they see similar results?

kpheasey commented 5 years ago

@ericboehs I've submitted a PR, #892, that improves upon your implementation. Just caching node_modules is not enough, we need to include ~/.yarn-cache, tmp/cache/webpacker, and public/packs too.

jrochkind commented 3 years ago

That -- three years later -- there is no non-hacky supported/documented solution to caching yarn install for a Rails/webpacker app is definitely affecting my opinion of heroku support for rails and heroku stagnation generally.

philippegirard commented 3 years ago

Hey any update on that issue. It's been more than 3 years. In my case the build-time is taking 20-25 minutes on a medium sized Rails application (100 users) with a React frontend. I think Rails is running rails assets:precompile to compile assets with webpack. It is getting worse and lately I have been getting SEGFAULTs on top when the compiling process run for above 25 minutes (occurs 1 time in 10): image

Did someone has found a clean solution to this that can be implemented in a production environment? I am thinking about spliting the rails assets:precompile step into a github action (which support caching) (like this: https://stackoverflow.com/questions/21408804/heroku-rake-assetsprecompile-too-slow).

kpheasey commented 3 years ago

@philippegirard I moved all my clients off Heroku and onto AWS CodePipeline+ECS+RDS. It requires some Dockerfile know how, or copy/paste lol. Despite the steep learning curve it's way cheaper, utilizes managed solutions (no worrying about downtime), and my push to deployment time is about 8-12 minutes for larger Rails+React applications.

philippegirard commented 3 years ago

@kpheasey that's my backup option. It's sad that the solution is to quit heroku. I still hope for a way to stay on heroku and do not go through the pain of the migration.

krnjn commented 3 years ago

We used to have this issue but used the split chunks plugin which helped us. Also be sure you are ignoring / not transpiling all of your node modules as that can sometimes cause issues. FWIW here's our setup that makes this run in ~10m in total on Heroku:

config/webpack/environment.js

const { environment } = require("@rails/webpacker");

// resolve-url-loader must be used before sass-loader
environment.loaders.get("sass").use.splice(-1, 0, {
  loader: "resolve-url-loader",
});

// default config from https://webpack.js.org/plugins/split-chunks-plugin/#optimizationsplitchunks
environment.splitChunks((config) =>
  Object.assign({}, config, {
    optimization: {
      splitChunks: {
        chunks: "all", // changed to "all" from "async" in default
        minSize: 20000,
        // minRemainingSize: 0, // this option does not work with webpack even included in default
        maxSize: 0,
        minChunks: 1,
        maxAsyncRequests: 30,
        maxInitialRequests: 30,
        automaticNameDelimiter: "~",
        enforceSizeThreshold: 50000,
        cacheGroups: {
          defaultVendors: {
            test: /[\\/]node_modules[\\/]/,
            priority: -10,
          },
          default: {
            minChunks: 2,
            priority: -20,
            reuseExistingChunk: true,
          },
        },
      },
    },
  })
);

module.exports = environment;

environment.loaders.delete("nodeModules");

config/webpack/production.js

process.env.NODE_ENV = process.env.NODE_ENV || "production";
const environment = require("./environment");
const CompressionPlugin = require("compression-webpack-plugin");
const TerserPlugin = require("terser-webpack-plugin");

environment.config.merge({
  devtool: "hidden-source-map",
  optimization: {
    minimizer: [
      new TerserPlugin({
        extractComments: false,
        parallel: true,
        cache: true,
        sourceMap: false,
        terserOptions: {
          parse: {
            ecma: 8,
          },
          compress: {
            ecma: 5,
            warnings: false,
            comparisons: false,
          },
          mangle: {
            safari10: true,
          },
          output: {
            ecma: 5,
            comments: false,
            ascii_only: true,
          },
        },
      }),
    ],
  },
});

// Insert before a given plugin
environment.plugins.prepend(
  "Compression",
  new CompressionPlugin({
    filename: "[path].br[query]",
    algorithm: "brotliCompress",
    test: /\.(ts|tsx|js|jsx|css|scss|png|jpeg|jpg|svg|eot|woff|woff2|ttf|otf)$/,
    compressionOptions: { level: 11 },
    threshold: 10240,
    minRatio: 0.8,
    deleteOriginalAssets: false,
  })
);

module.exports = environment.toWebpackConfig();
swrobel commented 3 years ago

@krnjn while this is cool, it doesn't solve the problem of yarn re-installing packages from scratch on every build.

philippegirard commented 3 years ago

Hey @krnjn I was finally able to make it work.

I needed to change all the javascript_pack_tag with javascript_packs_with_chunks_tag in addition to your changes to make it work.

example

<div id="reactappv1"></div>
<%= javascript_packs_with_chunks_tag 'spa/app' %>

I wrote the steps I took to make it work on medium in case somebody else need more details: https://philstories.medium.com/slow-build-on-heroku-with-rails-and-react-c6bef3a0ae2d

My builds went from 45 minutes to 3 minutes. Only the splitChunk plugin had a significant impact.

schneems commented 3 years ago

My builds went from 45 minutes to 3 minutes.

I tried to pair with the former nodejs lang owner and essentially anything over 5 minutes was a MAJOR red flag. Even without caching. I sadly don't know more details though :(

is definitely affecting my opinion of heroku support for rails and heroku stagnation generally.

Well. I'm only one person. With a re-org I'm also having to contribute to several other languages and dev "salesforce functions" from scratch. I want to keep these things open because I want to keep visibility that they need to be done. However, it's also maybe sending the message that "I'm working on this currently" which is not true. I'm locking this issue but keeping it open :(

This buildpack is in the process of being deprecated/removed. I'm working on a re-write and the plan is to have nodejs buildpack do all the installation via cloud native buildpacks. https://github.com/heroku/buildpacks-ruby (AND that buildpack needs to be re-written in Rust as we've decided to standardize on buildpack languages since we're moving away from a single owner model.

If/when that CNB ever happens/ships then it will maybe open up the window to not have dev dependencies cleared between that buildpack and ruby. But that change needs an upstream fix to the CNB spec which will be work.

Reading through the issues, it seems the yarn cache is a tiny part of the overall experience people are seeing. It sounds like the original issue of yarn cache accounts for ~10 seconds or so.

The larger issue @philippegirard is pointing to is webpack/webpacker caching which isn't standardized and isn't even supported via heroku/nodejs yet. I've said in other threads and I'll say it again, supporting sprockets caching is likely the single most costly feature the buildpack has ever taken on (in terms of support tickets and debugging hours). Webpacker caching is much more "roll your own" compared to sprockets which makes supporting EVEN harder.

That -- three years later -- there is no non-hacky supported/documented solution

The fact that there's not an easy-to-use community solution is also telling. This is an extremely hard problem "caching and cache invalidation" is a FAMOUSLY hard problem. Harder still, in this case, is that webpack hasn't converged on a "just works" solution and webpacker/rails haven't adopted that solution. When a rails new sets up webpacker to work with caching out of the box, then the story changes dramatically.

I'm hoping after: finishing the Ruby CNB re-write, upstreaming CNB changes to the spec, re-writing the Ruby CNB in Rust, shipping salesforce function support for Ruby, then that process will open the door for more collaboration between ruby and node and maybe if we're having the whole team look at this issue of yarn and webpack caching (rather than just me) we'll be able to make progress where I've not been able to before.

Edits:

schneems commented 3 years ago

I think locking was an over-reaction as clearly some developers are still getting value out of it, and talking about workarounds. These are valuable conversations for me as well. I want to re-open the thread to enable that conversation to continue here.

Also, I wanted to link to this from another webpacker conversation https://github.com/heroku/heroku-buildpack-ruby/pull/892#issuecomment-621249249