rotemdan / lzutf8.js

A high-performance Javascript string compression library
MIT License
322 stars 26 forks source link

vs lz-string #10

Closed ronag closed 7 years ago

ronag commented 7 years ago

Just did a comparison between lzutf8 and lz-string using:

https://rotemdan.github.io/lzutf8/demo/ http://pieroxy.net/blog/pages/lz-string/demo.html

Using the input

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

I get:

time size
raw - 574 bytes
lzutf8 2.8ms 466 bytes
lz-string 1 ms 196 bytes
rotemdan commented 7 years ago

Hi,

lz-string is based on LZW, which includes a dictionary compression stage (LZ78), and possibly follows that by an entropy coder like Huffman or arithmetic coding (I'm not sure). lz-utf8, however, only applies a dictionary coder equivalent to LZ77. One of the characteristics of LZ77 (compared to LZ78 + entropy) is that its compression ratio is lower but its decompression is extremely fast. Overall the library and choice of algorithm is optimized for speed, not size, (and for creating a binary format that can naturally accept plain utf-8 strings).

For very short inputs, lz-utf8 may not be as effective as lz-string when regards to size. Try comparing with longer strings.

The timing difference, which varies between browsers is a bit misleading. The reason this short example appears to run faster in lz-string, I believe, is probably due to initialization time (creation of data structures, lookup tables etc.), since the string is so short the compression time itself may be negligible. Also it may be the resolution of the timer I'm using in the demo is higher.

Anyway, with longer strings, like the first 1MB of the English bible, I get:

So here lz-utf8 is 4.6 times faster during compression and 8 times faster during decompression, though the compression efficiency is significantly lower.

Results would also vary between inputs, of course.

rotemdan commented 7 years ago

In any case, since the binary format is frozen, the low compression efficiency cannot be improved. In general, it was a conscious an intentional decision to strongly optimize for speed. Any changes to the choice algorithm would must, in practice, result in the creation of a completely different format and library. I don't really have any plans for that at the moment.