rails / propshaft

Deliver assets for Rails
MIT License
926 stars 98 forks source link

Revisit Gzip compression support? #158

Closed nickjj closed 1 year ago

nickjj commented 1 year ago

Hi,

Sorry for bringing this up again because someone opened https://github.com/rails/propshaft/issues/86 about ~18 months ago but I think the landscape of things may have changed since then, especially after seeing DHH's keynote at RailsWorld 2023.

I'd like to quote my comment in that issue:

NGINX has a cost to do compression since it takes CPU time to do compression on the fly, especially if it's maxed out. This is objectively worse than having your digester compress the files for you beforehand, then you can configure NGINX to say the files are already compressed and serve them as they are without compute time.

Tools like Tailwind and esbuild do not compress files either which leaves performing this action up to the user. That means every user will need to invent their own way to do this and also apply it for every deployment. Would it be fair to say this goes against the grain of what Rails tries to provide? Especially so with Rails pushing hard to use import maps and drop front-end toolchains.

DHH mentioned he likes that Propshaft is much smaller than Sprockets and only really focuses on 2 things but I think given the above having compression support built in aligns well with compressing complexity?

This also aligns with Kamal's philosophy of getting developers deploying their applications. Not everyone is going to be using a CDN where compression sometimes happens. Deploying to a single server with NGINX in front is a really popular use case, personally I've been doing that for around 10 years now (~8 of them with Docker).

I think adding support for Gzip should at least be considered. The zlib module is built into Ruby. I know the feature is more than just the few lines of code to create a compress_file function, but just throwing it out there.

Additionally, adding Brotli is another option, in the keynote DHH mentioned to skate to where the puck is going and Brotli is probably where it'll be. It's well supported by browsers but NGINX doesn't support it out of the box. It's an NGINX Plus feature although you can compile NGINX yourself with the Brotli module. Still, you could create the files with Propshaft and let users take it from there. You could technically support both compression methods.

brenogazzola commented 1 year ago

Thank you for the well reasoned arguments @nickjj. We can revisit this, but here's a bit more of our reasoning to so far not supporting compression:

  1. Your app shouldn't be constantly serving assets. A few requests hitting the app servers, like Puma or Passenger, is acceptable until a CDN like Cloudflare or CloudFront, or a reverse proxy like Nginx or Varnish, caches them. If your Nginx has to compress assets on-the-fly for every request, you're probably missing a piece of your infrastructure. A free CDN with compression support would help by reducing the number of requests reaching your app server, thus lightening their load and improving response times.

  2. As you pointed out, it's not just about compressing the files. To benefit from this, either the client needs to explicitly request the compressed file, or the app server must read the Accept-Encoding header for standard (.css, .js) requests and serve the compressed (.css.gz, .js.gz) versions instead. The latter isn't feasible in production, as it's not Propshaft but the app's file server that's serving the files. This would mean modifying the asset tags (like javascript_include_tag, stylesheet_link_tag, etc.) to detect that header and forward it to Propshaft. These changes would require updates to Rails itself. Even then, this would only work for entry points. If your CSS files use @import or your JavaScript uses dynamic imports, there's no way to modify those, so you'd still get uncompressed files for most of your assets.

nickjj commented 1 year ago

No problem, thanks for replying.

  1. A single box deploy with nginx wouldn't have a cache as far as I know. Unless I missed something in their docs nginx does not cache gzipped content. There are free CDNs but it means coupling yourself to cloud services such as Cloudflare.
  2. I think this is mostly handled at the nginx level? If nginx is serving your static files, as long as you set long lived cache headers in nginx for your asset path(s) and Rails provides digested static files then those files will be cached by clients without any further application / Rails changes? What you described is only a problem at the Rails / app level if you're not using nginx or a CDN right (such as having Rails serve your assets directly)?
brenogazzola commented 1 year ago

I've gone over the Nginx documentation again. Couldn't find a way to compress and cache the files at the same time as you've mentioned, so you are right on this point.

Also, I was reminded of gzip_static, which let's nginx request to the upstream for the gzipped version of the static file that was requested. The brotli module has the same feature. This is by the way, what I meant with my point 2 (not anything about caching headers, sorry if my explanation was confusing)

However, I'm not sure if we can combine gzip_static with proxy_cache directives to have nginx ask the upstream for the gzip version once and serve it from now on. Do you know if it's possible? Otherwise we are still overloading the app server with unnecessary asset requests, even if propshaft provided the compressed versions.

I'll also admit I'm a bit resistant to your argument that coupling to CDN is a negative. In my personal experience (which means I'm probably not aware of something that would make my argument invalid), there are so many upsides to having a CDN like Cloudflare in front of your app that I can't see how the few downsides would justify to not having them.

Finally, our arguments are based on nginx (and maybe traefik) supporting those features. Someone who is using a cloud load balancer will not have access to them features and would still need a CDN.

By all this I don't mean "no, we don't want this". Instead it's more "Do we really want to support compression code when it seems it's only going to be used in a single use case (single box, nginx/traefik, no CDN)? Sprockets has a lot of dead code and this was of the reasons behind our rewrite"

nickjj commented 1 year ago

Also, I was reminded of gzip_static, which let's nginx request to the upstream for the gzipped version of the static file that was requested

gzip_static will look for precompressd .gz files without contacting the upstream. It bypasses nginx even attempting to compress files, it will happily serve any matching .gz files it finds. That's where Propshaft would come into play -- it would provide the .gz files. This way nginx doesn't have to compress the files on each request.

Zero requests would be made to the app server in this case, assuming you've configured nginx to serve your assets.

I'll also admit I'm a bit resistant to your argument that coupling to CDN is a negative.

It depends. I think using a CDN is generally a good thing but from a moral perspective Cloudflare controls a lot of the internet and contributing to that is a little dangerous for the sake of the internet.

There are other free CDNs of course but blindly using a CDN is risky because if it has issues then your site has issues. A huge track record is of the utmost importance here. That's why I think there's a decently large class of applications that can get by without a CDN. You can avoid all of that and just have nginx serve your assets for you.

If you get big enough later on you can introduce a CDN. You could make the case that if you "only" have a few hundred thousand requests per month you can get by with nginx compressing assets on the fly even with a lower end VPS / VM and I'd generally agree but if you can shave off a few % CPU usage mostly for free that doesn't seem like a bad deal right?

Finally, our arguments are based on nginx (and maybe traefik) supporting those features. Someone who is using a cloud load balancer will not have access to them features and would still need a CDN.

You could do Cloud load balancer -> nginx -> app. A load balancer and web server are solving different problems. I often use them together, although technically nginx is capable of performing load balancing too but that's a separate matter! nginx is pretty much a swiss army knife (load balancer, reverse proxy, redirecting and writing URLs, serving assets, TLS termination, rate limiting, allow / block lists, geo detection, etc.).

By all this I don't mean "no, we don't want this". Instead it's more "Do we really want to support compression code when it seems it's only going to be used in a single use case (single box, nginx/traefik, no CDN)? Sprockets has a lot of dead code and this was of the reasons behind our rewrite"

Yep, completely understood and agreed.

Could it potentially start with generating .gz files in the same directory it dumps the digested files into? I'm honestly not sure if it needs any more than that besides the usual stuff that comes with code (documentation, tests, etc.).

brenogazzola commented 1 year ago

Zero requests would be made to the app server in this case, assuming you've configured nginx to serve your assets.

Just so I'm sure I understood this correctly. If I create a new css file and precompile, we'd have two files colors.css and colors.css.gz.

In your scenario, if user A requests colors.css, nginx (which has never seem this file before) would hit puma to grab colors.css.gz. Then when user B requests colors.css your are saying instead of hitting puma, nginx would serve it from cache?

Or the configuration you are talking about is having nginx pull files directly from public/assets folder from the app servers, instead of making a request to puma?

nickjj commented 1 year ago

Here's a bare bones example of how nginx could work with your app to handle the use case of wanting nginx to serve static files and also pass traffic to your app backend for everything else:

Let's not worry about gzip yet

At this point independent of Propshaft or Sprockets or Puma, nginx will directly serve your static files instead of your app. When it serves that response it will also send a long lived cache header to the client and if your client supports it, your client will cache the static files locally (typically your browser's cache).

We've successfully decoupled serving static files from Puma. That ends up working because nginx does pattern matching on its location blocks and as soon as it finds a match (top to bottom) it will run it. In this case it will look for files on disk within your nginx configured root, find a match (assuming the asset exists) and then serve it.

Puma will only be contacted when it matches your proxy_pass location block. Typically you use / for this and put it near the bottom of your config because it's a catch all for everything that's not already handled ahead of time such as assets, redirects or whatever else you want nginx to handle.

gzip option 1: Let nginx gzip files on the fly (current)

You can configure nginx with gzip on; and optionally set a few more options to control things like the compression level and what types of files to compress.

If you combine this config with our basic example from above, now nginx will gzip your digested files on every request. At least that's what I gathered from reading the docs. I didn't find anything that indicates nginx will write these .gz files to disk or store them in memory if they've already been accessed.

I'm speculating here but I'm guessing it will read the file from disk, gzip it and then serve the response. It's possible that assets that have already been served are read from memory but I think the gzip compression aspect happens on every request. The OS may also do some form of disk caching here but I haven't ever gone down that rabbit hole or benchmarked it.

gzip option 2: Provide your own gzipped files (proposed)

You can continue configuring everything we've done so far but make 1 tiny modification at the nginx level. We can keep gzip on; in our main nginx config but within our assets location block that had the cache headers you can also add gzip_static always; and now nginx will attempt to serve the gzipped version of that asset if it exists without doing any compression of its own. If it doesn't exist it will fall back to gzipping it on the fly due to the other config option we set.

This is the behavior that we can't currently do with Propshaft because the gzipped files don't exist ahead of time. Propshaft would need to be modified to create .gz versions of each digested file.

brenogazzola commented 1 year ago

Alright, this looks good enough to me. I'll take this to the other maintainers to see what they think.

airblade commented 1 year ago

This all sounds Nginx-centric. Maybe here Nginx is just a placeholder for web server, but in case it's not I'd just like to point out that not everybody uses Nginx (I use Caddy). If it came down to it, I'd rather add a CDN than Nginx to my stack.

nickjj commented 1 year ago

You're right in that nginx is being used as a placeholder for web server.

Most popular web servers support this:

Nothing about this implementation is specific to nginx. I used it as an example because it's a really popular web server.

brenogazzola commented 1 year ago

Hey @nickjj , we've talked and we are sympathetic to the idea that devs should be being able to run a project on a basic VM without a commercial setup. This is a problem we'd like to solve, however we don't feel that Propshaft is the best place to do this.

That said, I'd like to propose a solution that should handle your use case: enhance the precompile task and add the code needed for compression. For reference, here's a quick and dirty example on how to do it. Place this in an initializer in your app.

require "rake"

Rake::Task["assets:precompile"].enhance do
  assembly    = Rails.application.assets
  output_path = assembly.config.output_path

  assembly.load_path.assets.each do |asset|
    asset_path      = output_path.join(asset.digested_path)
    compressed_path = output_path.join(asset.digested_path.to_s + ".br")

    unless compressed_path.exist?
      Propshaft.logger.info "Compressing #{asset.digested_path}"
      `brotli #{asset_path} -o #{compressed_path}`
    end
  end
end if Rake::Task.task_defined?("assets:precompile")
nickjj commented 1 year ago

That feels good.

Do you think something like that could end up being a future Rails default? Maybe to begin with an option that could be an array of compression algorithms or [] / false which defaults to disabled until potentially Rails 8 or whenever Propshaft is the default? For example it could be [:gzip, :brotli] if you wanted both or just [:gzip] for gzip.

Although I'd suggest shipping with [:gzip] by default since open source nginx doesn't support Brotli out of the box.

brenogazzola commented 1 year ago

IMO this would be better as a infrastructure/tooling solution instead of a Rails thing. Sprockets tried to do everything itself and ended with a lot of dead code as technology moved forward and it couldn't keep up.

I'm closing this issue, but feel free to continue commenting.

rmacklin commented 1 year ago

@brenogazzola That code example could be a useful addition to the docs - I've opened https://github.com/rails/propshaft/pull/161 (crediting you as coauthor)

nickjj commented 1 year ago

@brenogazzola No problem. Please potentially revisit that in the future too. :D, if it ends up not being in Rails or Propshaft but instead Kamal then everyone not using Kamal will need to copy around the same code snippet you've provided. Or were you thinking of a different tool?

The docs are a nice touch to get folks onboarded with Propshaft, good idea!

benlangfeld commented 4 months ago

@brenogazzola Given your example, how do you suggest that cleaning old versions of these files happens? Propshaft deals with cleaning up old versions of uncompressed assets, and by not including compression in Propshaft, it's also necessary to duplicate cleaning routines, right?