volution / kawipiko

kawipiko -- blazingly fast static HTTP server -- focused on low latency and high concurrency, by leveraging Go, `fasthttp` and the CDB embedded database
396 stars 10 forks source link

Feature Request: Supporting brotli and gzip statically #10

Closed lemondevxyz closed 1 month ago

lemondevxyz commented 1 month ago

Hello,

I've been following this project closely for quite a while. I like the philosophy of this project and I want to add to it.

Currently, I am using caddy to serve my files over my many websites. A common approach I do is provide uncompressed files, gzip and brotli statically.

My feature request is to serve gzip and brotli based on if the file exists on the directory. For example, if a directory contains index.html and index.html.gz and index.html.br it would serve the compressed version if the encoding is accepted.

I am open to implementing this feature myself.

The server probably needs a flag to enable this behavior as the behavior could be surprising to some.

cipriancraciun commented 1 month ago

Before commenting on the problem (please see bellow as it's important), I'll comment on how I think this could be implemented:

However, all these should be implemented with atmost care as not to allocate memory, especially the "append suffix" part (see the issue on CDB64 about the reason).


On the other side, what is the exact use-case you are trying to implement?

Because, according to caniuse.com, "gzip compression is effectively supported by all browsers", see https://caniuse.com/sr_content-encoding-gzip, and Brotli compression is supported 99% of the users (all except IE), see https://caniuse.com/brotli.

Thus, why the extra complexity?

Are you actually serving clients that don't support compression?

(Also, perhaps always serving Brotli compressed content would be an extra measure against poorly written scaping bots.) :)

cipriancraciun commented 1 month ago

@lemondevxyz see #13 discussion, as it might impact your decision to patch Kawipiko (as it's written today in Go).

cipriancraciun commented 1 month ago

@lemondevxyz also see discussion #14 which might be an alternative.

lemondevxyz commented 1 month ago

brotli, according to a recent paper published by the R archive network, comes out as the fastest for compression, decompression and the highest compression ratio(i.e. brotli produces the smallest files) when compared against lzma, deflate, bzip2, lzham, zopfli.

I'd ideally run brotli, gzip and uncompressed to serve users who support each. Sometimes, some users, the technical ones to be specific, disable certain features even if they are faster, better and smaller for security concerns.

For example, just last year there was a bug with WebP, a next generation image format by Google, where some users could be able to execute arbitrary code via buffer overflow.

Similar bugs could arise in gzip or brotli which is why one needs to serve uncompressed if users disable both for security reasons.


Might I add that this feature is rather simple. Comparison to a set of predefined encodings, which are two for the time being(can be defined as constants for no byte allocation), and using the correct file which could be using a different CDB file(i.e. opening 3 CDB files each for each encoding) or appending to the file the extension of the encoding.

cipriancraciun commented 1 month ago

brotli, comes out as [...] the highest compression ratio.

In absolute terms perhaps. In real terms (i.e. on a real world dataset), at least the way Kawipiko uses it (i.e. archiving each individual response body independently), the difference between Brotli (even at the highest compression level which is insane on execution time) and Gzip yields only a couple of percent better compression.

Thus, in real world scenarios, I would say just use Gzip compression (perhaps with Zopfli, which is supported by the Kawipiko archiver).


I'd ideally run brotli, gzip and uncompressed to serve users who support each.

Well, based on the most deployed browsers out there, virtually every single one supports Gzip, and almost all support Brotli. Thus one has to really ask which client wouldn't actually support say Gzip compression?


Sometimes, some users, the technical ones to be specific, disable certain features even if they are faster, better and smaller for security concerns.

Well, in Firefox I didn't find a way to disable Brotli decompression, and most likely there isn't a way to disable Gzip decompression.

But, say one does disable Brotli or Gzip. That is a bad idea for privacy, because now, all of a sudden, the combination of the User-Agent (which most likely implies it should support compression) and combined with the lack of Accept-Encoding header (or with just the value identity), just allows every server to pinpoint your browser.

Thus, again, I would say nobody security conscientious would actually disable Gzip or Brotli.


Comparison to a set of predefined encodings, which are two for the time being(can be defined as constants for no byte allocation),

It's not that simple; according to the MDN https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding, the user-agent (i.e. the browser) can specify something like Accept-Encoding: deflate, gzip;q=1.0, *;q=0.5.

Thus, it's not just a simple byte slice match (or byte slice searching), one needs to parse the multiple algorithms, sort them according to the preference, and iterate over that.

Or, we could not support weights, but then why bother if we don't actually support the full feature?


and using the correct file which could be using a different CDB file (i.e. opening 3 CDB files each for each encoding) or appending to the file the extension of the encoding.

Using 3 CDB files is out of the question, as it complicates the server code (and the archiver code) beyond to what it supports today. (Perhaps this is a feature to have in mind for the Rust rewrite.)

lemondevxyz commented 1 month ago

You raise some good points in-regards to brotli and gzip support. I was totally clueless of the weight feature in Accept-Encoding which complicates things.

I also realized that having 3 CDB files would 3x the size of your websites. Furthermore, compressing them just in time or ahead of time at startup would defeat the point.

I guess this issue is out of scope since memory must be allocated to support this feature. GZIP it is!

cipriancraciun commented 1 month ago

I guess this issue is out of scope since memory must be allocated to support this feature.

I think it can be implemented without heap memory allocation (the one that involves the GC), by just using "large enough" stack allocated arrays. But the code would be quite tedious to write, and a lot of time will be spent on profiling to be sure nothing escapes to the heap (Go is notorious for taking stack values that seem to be stack allocated and just allocating them from the heap).

Thus, taking into account all that I've said about the support for Brotli and Gzip in browsers, implementing it would just waste a lot of development effort, for very little gain.

As a conclusion (for others that might ask for this feature, and so that I can point them to this thread): lacking support for different compression mechanisms (none, Gzip, Brotli), and just forcing one ahead of time, might seem a downside; however, if one weighs in real-world scenarios and development effort and runtime costs, the answer becomes a bit clearer.