timloo / memcached

Automatically exported from code.google.com/p/memcached
0 stars 0 forks source link

{Feature Request}:Allow different type of compression than deflate for large values. #317

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
By default memcached uses zlib's deflate for its compression when values are 
above a certain size. Now while this is "OK", I would like to see the server 
get moved into the 21st century. As far as algorithms to be allowed/added to 
the server, I have three in mind that I'd like to see. Either of them is 
perfectly fine from where I am concerned.

Why would you need to change the compression? Because zlib is very very slow, 
and outdated. Sure it will save some space in ram, but the time the CPU spends 
compressing it is wasteful. So I'd like to see one of the following algorithms 
added(not by default so that people have time to adjust).

The algorithms in question are lzo, lz4, or quicklz. I'll state why I think 
each is good below with their own pros and cons. All 3 roughly compress to the 
same ratio, but the speed between them varies greatly.

First off is quicklz, it is by far the closest to zlib's compression ratio, but 
is also the slowest of the bunch. It's speed is ~1.5x slower than the 
fastest(lz4). It is also an old algorithm that has been around awhile and 
likely has all of the main bugs ironed out. It is also being used in tokudb as 
a compression algorithm option and they seem to deal with a ton of data, and 
seem to think it's stable enough for their high dollar/very serious use cases.

Next up is lzo, it is directly in the middle of the road in both compression 
ratio and in terms of speed. It's slightly slower than lz4 in terms of 
compression, and a good deal slower at decompression. It is still not stable 
enough for the kernel, but I think the algorithm is very advanced and is 
worthwhile for use inside of memcached. It is being used by "zram", 
"compcache", and also "zswap" in the kernel.

The final one is lz4, the algorithm is very new(less than 3 years old), but 
it's speed is by far the best. It's compression ratio isn't as good as the 
other two, but the speed will surely make up for it. This algorithm hasn't yet 
gained any sort of widespread use, and I'm unsure of any big name applications 
using it.

To summarize, I'd love to see one of the three algorithms added to memcached as 
an option for clients to use. Although deflate works, it's slow, deflate could 
still be the default from now until the end of the world... but I think it's 
time for memcache to get another compression algorithm added to it, so that 
there's an option for people who don't want to use deflate/can't use it due to 
performance issues.

Original issue reported on code.google.com by 133794...@gmail.com on 23 Apr 2013 at 4:28

GoogleCodeExporter commented 9 years ago
I cannot edit it, so I'll just post this here. Hadoop is using/supports lz4 so 
there is a big name/tested system that is using the algorithm. I just realized 
that, and thus wanted to say that it does have a huge user base that have been 
testing it.

Original comment by 133794...@gmail.com on 23 Apr 2013 at 4:38

GoogleCodeExporter commented 9 years ago
You do realize that the compression algorithm is used from the client, and is 
completely arbitrary? The "bit flag" is just used by a client to determine that 
it is compressed or not. It doesn't signify "zlib" and isn't technically 
standard across all clients. I even think some clients support multiple 
compression algos already via configuration (I'd be surprised if libmemcached 
didn't).

If you want another algo, make the client use another one.

Original comment by dorma...@rydia.net on 23 Apr 2013 at 10:29

GoogleCodeExporter commented 9 years ago
> This algorithm hasn't yet gained any sort of widespread use, and I'm unsure 
of any big name applications using it.

You already mentionned Hadoop, but you could also name ZFS, BTRFS, Illumos, 
FreeBSD, and now even Linux for kernel booting & Grub. Don't forget Lucene 
Search & Cassandra storage for the Java version.

Original comment by HugoChev...@gmail.com on 24 Apr 2013 at 9:43

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
@dorma I didn't realize it was a client feature. I'll go file a bug request 
with libmemcache then. Sorry for that. I thought the _server_ was actaully 
doing the compression. 

@hugochev, I realized that after I went to the site, and didn't think posting 
again would make me seen dumber for forgetting them...

Anyway you can close the feature request then, I'll go hunt down the php 
memcache client guys then. 

Original comment by 133794...@gmail.com on 24 Apr 2013 at 12:31

GoogleCodeExporter commented 9 years ago
closing

Original comment by dorma...@rydia.net on 28 Jul 2013 at 2:24