jstrieb / urlpages

Create and view web pages stored entirely in the URL
http://jstrieb.github.io/urlpages
MIT License
1.38k stars 128 forks source link

Optimize compression by utilizing the Huffman algorithm #10

Closed swissChili closed 5 years ago

swissChili commented 5 years ago

Huffman is an algorithm for compressing textual data. It consumes a string and creates another string, which makes it perfect for this application. An example I found describes it as such:

Huffman encoding is based on the principle that letters that appear more frequently should have a smaller code than the ones that are used less often. So, in the English language, vowels would be used more than the letter 'z', and would get shorter codes.

Please note that while using the aforementioned example, you will likely see an extreme increase in size when compressing small strings. This is because it generates not only

JavaScript implementations of this include wilkerlucio's

I tested this a bit locally and got good results. For instance, a string that was 32 bytes compressed with base64, turned out to be only 24 after being compressed with Huffman.

This would work well for this program as the data being compressed is code, which, being english based should be compressed nicely by huffman.

jstrieb commented 5 years ago

Thanks for the tip! Currently researching the best way to integrate Brotli for URL compression since it seems to be designed for exactly this kind of data. Turns out Brotli uses Huffman Coding! I'll make sure to include this info in the TODO section when I update the README