lifthrasiir / roadroller

Roadroller: Flattens Your JavaScript Demo
https://lifthrasiir.github.io/roadroller/
Other
328 stars 12 forks source link

Directly produce an optimal ZIP/gzip file #13

Open lifthrasiir opened 3 years ago

lifthrasiir commented 3 years ago

Roadroller strongly depends on DEFLATE's Huffman coding because JS string literals are not efficient in terms of information entropy (~6.96 bits per byte). The first line specifically exploits this by using the minimum amount of literal bytes, but a stock zlib doesn't fully recognize this difference in the symbol distribution and normally combines two lines into one block. Zopfli-like tools do recognize this, but the user has to use those tools to benefit from this.

Maybe we can solve this UX problem by directly producing an optimal ZIP/gzip file from Roadroller itself. This is not a small task because:

While Roadroller somehow has a working implementation of zlib (-5), the optimal size can only be reached with Zopfli or similar tool so Roadroller should depend on that.

lifthrasiir commented 2 years ago

As per #29, these would be implemented as the following additional output formats:

I originally used -F6gz and so on, but thinking about that it should be -F8gz etc. because their coding rate should be around 8 bit/byte minus fixed overhead and users would think the first digit as an relative efficiency, not a part of the internal implementation strategy.

For -F8zip the file name should be supplied. This can be done either with a separate argument (--zip-file-name index.html) or with a combined argument (-F8zip:index.html).

To my knowledge there is no tool that directly recompresses a truncated PNG file, so -F8zpng might be a stretch since it can't be recompressed by external tools.

lifthrasiir commented 2 years ago

I've also briefly considered -F8zwebp which uses WebP Lossless instead of PNG, but it wasn't significantly better than (optimally recompressed) PNG because both use bytewise LZ77 + Huffman coding as a backend. WebP Lossless is better at exploiting spatial locality than PNG but that is useless for our purpose anyway.