[Question] How the algorithm works

clowerweb commented 10 years ago

I'm mainly using this script to compress data that has been converted from binary bits into text (so mostly long strings of 1s and 0s that represent 1 bit per pixel bitmap graphics data extracted from Atari, Intellivision and Colecovision ROMs). I was using Huffman compression previously, and then experimented with LZW and LZ77. I actually found your algorithm outputs a much smaller result than any of the others (the closest was this Huffman algorithm: http://rumkin.com/tools/compression/compress_huff.php, but the output from yours is actually just over 1kB smaller, not counting any decompression code). Why is your tool outputting so much smaller output than any other compression method I've found? There is one that's 60 bytes smaller for my use case, and that's the one @aivopaas made which is similar to yours and from what I understand was in the same js1k competition.

Form my purposes, I don't actually care about the compressor being 1kB. I think it's a really amazing compression algorithm and would like to see it optimized for 1.) Speed (compress/decompress hundreds or thousands of kilobytes of data as quickly as possible), 2.) working with more than ~127 or whatever replacements, and 3.) outputting the absolute smallest compressed string possible.

aivopaas commented 9 years ago

Oh, nice to see that my algorithm is still kicking other's in the sweet spot.

The reason this kind of algorithms work so well compared to the general algorithms is that they are specifically designed for the type of data that they compress - relatively small strings of guaranteed replacements and lots of unused bytes to hijack.

About the improvements.. 1) My latest version that's available at http://iteral.com/jscrush/ is somewhat optimized for speed, there's no good way to make it any faster other than writing it in some other language / machine code. I have tested it with the minified source of jQuery and it actually finished in reasonable time and did a pretty good job (the first version was way too slow for a few KBs already). 2) My version supports more than 127 replacements, it uses the first 1000 unicode chars and could easily use more. Just keep in mind that some of the replacements will begin to use more than 1 byte and the gains get smaller and smaller. 3) Getting the absolute smallest string for large data is not a trivial task and getting that done in finite time (less than your lifetime) would be worth a lot less than the couple more bytes that you would save.

clowerweb commented 9 years ago

@aivopaas Thank you for the explanation and updated code! Looks like you were still able to pull it off in 1kB, which is impressive. I'll have a look now and see if I can get it to work (the old version that only supported very limited replacements would break of course after it reached the maximum replacements, which meant larger scripts or strings couldn't be compressed with it unless they only had a very limited number of repeating patterns). I'm glad to hear that this restriction has been more or less removed. Thanks!

timdown / js1k_2013

[Question] How the algorithm works #3