madler / zlib

A massively spiffy yet delicately unobtrusive compression library.
http://zlib.net/
Other
5.68k stars 2.45k forks source link

Optimized longest_match function (code submission) #231

Open gildor2 opened 7 years ago

gildor2 commented 7 years ago

Hi Mark,

I'd like to submit my code for faster deflate compression. Actually I wrote code in 2004, using x86 assembly, but now I decided to "resurrect" my project. So I spent 2-3 recent weeks doing careful port of my assembly code to C, optimizing it and fixing all possible issues.

The project resides here: https://github.com/gildor2/fast_zlib Precompiled libraries for Windows are here: https://github.com/gildor2/fast_zlib/releases Test results: https://github.com/gildor2/fast_zlib/wiki

In my tests, modifed version of zlib works 2.5-10x times faster. Than worse compression ratio and speed with original zlib, than more performance boost achieved. Assembly and C versions works with nearly identical performance, and 64-bit C version works faster than 32-bit one. I've tested code with Win32, Win64 and Linux32.

For testing I wrote a special application which feeds zlib with data from specified file system directory. I used 3 test "suites" which are 256Mb each. Assembly and C code produces exactly the same output. For verification I've added "PARANOID_CHECK" define in C code which validates every found match for correctness. If I'll force zlib to use maximal compression (using good_length=258, max_chain=32767) - original and modified zlib versions produces exactly the same output (but with different speed, of course).

The code doesn't contain a path for !defined(UNALIGNED_OK) just because I don't have any possibility to test it. However I think that it shouldn't be hard to add necessary parts because the main loop which preforms comparison of "scan" and "match" is very similar to original zlib.

The source code exists in Sources directory, there are 2 files: match32.asm - 32-bit assembly code for NASM assembler (compatible with Visual C and gcc, I didn't test other compilers) match.h - C implementation, a replacement for longest_match function.

For building zlib with C code I used a trick. I've made a deflate_stub.c file with following content:

#define ASMV
#include "deflate.c"

#undef local
#define local

#include "match.h"

void match_init()
{
}

and then used it instead of deflate.c.

Hope you'll find my submission useful. Feel free to ask any questions.

Thanks, Konstantin

silviucpp commented 7 years ago

@gildor2 maybe you can send your contribution also to zlib-ng https://github.com/Dead2/zlib-ng as well. That repo is more opened to performance enhancements

gildor2 commented 7 years ago

Thanks, but I think that won't be much better than for example opening my own zlib fork. I've checked zlib-ng for "serious optimizations" mentioned in their title, and found only one, which is available only for gcc compiler.

silviucpp commented 7 years ago

@gildor2 zlib-ng is much faster than the baseline. I did an erlang library some time ago and also I measured the performances https://github.com/silviucpp/ezlib and if we take in account that erlang overhead is constant for all, and also we ignore the measurements for erlang builtin driver it's visible that zlib-ng and cloudflare or intel's forks are much faster because of their MIPS optimizations.

gildor2 commented 7 years ago

I can't test zlib-ng by myself. It utilizes cmake for build, but I don't have it installed (and don't want to install). I didn't find a precompiled binaries either. So I can't compare performance.

Anyway, if zlib-ng would like to reuse my code - that's good. But I won't submit a pull request there. At least is is quite hard, because they have several completely different compression function sets - slightly modified original zlib, Intel's code, and something else. Only author of that fork can do integration correctly.

I'd be happy if my changes would be integrated into "baseline zlib". This is more useful, I think - I never heard that somebody uses zlib fork in Linux distribution or in some commercial application.

silviucpp commented 7 years ago

@gildor2 from what I know google did some pull requests on that fork and also I think they will switch chromium on it (I'm not 100 % sure but read the following):

https://docs.google.com/document/d/1XMa2z5LmYn3TLjDVRz-C2VcHBoeWwzCmPQyPrnJsGW8/edit#

https://bugs.chromium.org/p/chromium/issues/detail?id=687631

If you believe that your changes will make a difference maybe make sense to submit a pull requests. I'm sure you will get feedback on zlib-ng

gildor2 commented 7 years ago

Very interesting information, thanks. I'll read these documents.

silviucpp commented 7 years ago

@gildor2 regarding cmake you don't have to compile it Just install it from the repositories your OS is providing. Not sure what os you are using but on osx I used brew install cmake, on ubuntu apt-get install cmake, most probably cent os has yum install cmake

Silviu

gildor2 commented 7 years ago

I've tested zlib-ng (compiled with Visual Studio 2013). Added test results to the table.

silviucpp commented 7 years ago

@gildor2 you can send them a pull request with your idea and I'm sure will be evaluated. I see a lot of activity in the fork between the 2 maintainers and some folks from google.

gildor2 commented 7 years ago

I can't make a pull request, I'm not familiar with their code. In my opinion, these guys should integrate code by themselves.

The reason why I didn't make a pull request HERE, but opened an issue, is that it could be integrated into deflate.c, made as separate header (like I did), placed into "contrib". Or either rejected or ignored. I hope it won't be ignored. In this case, my changes will get into zlib-ng as well.

I think the reason why changes accumulated by zlib-ng didn't go into stock zlib, is that these changes are very platform- or compiler-specific. My code should work with any platform/compiler, I think.

ioquatix commented 7 years ago

Just to add my 2c, we migrated from libz to libz-ng and found a 29% improvement in total throughput of our app. Further improvements to zlib-ng would be so awesome since it's something that we get stuck in quite a bit.

gildor2 commented 7 years ago

In my tests, zlib-ng didn't bring any speed benefit. Even more, it worked slower than original zlib - https://github.com/gildor2/fast_zlib/wiki#results. Probably this is the reason why these optimizations weren't integrated into stock zlib. By the way, wouldn't it be nice if stock zlib will work much faster, without any compiler or platform specific optimizations?

ioquatix commented 7 years ago

Well, we have objective evidence that it's faster. I can provide more details.

Yes, it would be great if stock libz would be faster, but I think the problem is it's encumbered by a huge legacy of old code.

gildor2 commented 7 years ago

Probably it is faster on some particular OS with particular compiler - I don't know. I wasn't able to build zlib-ng with Visual Studio without doing some coding. This is bad for the project which tries to replace industry standard compression library.

ioquatix commented 7 years ago

I appreciate that you may have some frustrations, @gildor2, but everyone here, and at the zlib-ng project have only the best of intentions.

It was faster on Linux with both clang and gcc compilers, the difference especially pronounced at -O3.

gildor2 commented 6 years ago

Hi Mark,

I'm sorry for asking again, This issue ("code submission") is now older than one year, but I didn't hear anything back from you yet,

Are you interested in integration of my patch into stock zlib? I can redesign/cleanup my code if needed (like - removing any debug stuff etc). I can submit a pull request if this would be desired. I saw that you've integrated PR's like "5% speedup of decompression" etc, but I see no point of ignoring up to 10x speedup for compression speed. The code is generic, doesn't contain any SIMD or any other platform-dependent stuff, it was made as pure C stuff. It is designed as a replacement of longest_match function, nothing else has to be changed.

Thanks, Konstantin

nmoinvaz commented 3 years ago

For anybody still interested, I have incorporated these changes in https://github.com/zlib-ng/zlib-ng/pull/828.

madler commented 3 years ago

Such a fundamental change to the core of the zlib code would require exhaustive inspection and extensive testing before I could thrust it upon the world. I have to be very conservative about changes I make, since zlib is used and relied upon so widely. I have many other smaller changes that are overdue to zlib, so this one will not rise to the top of the list for a while. At that time, I will ask for a commit relative to the develop branch to start testing.

gildor2 commented 2 years ago

Thank you for response. Just let me know when the time will be ok for making a PR or another kind of commit.

uis246 commented 2 years ago

Any news?

uis246 commented 2 years ago

@madler ?