Open Adenilson opened 7 years ago
I recently implemented a NEON-ized version of Adler32 checksum (https://codereview.chromium.org/2676493007/, about 3x faster on ARMv8) and I'm looking forward to upstream this patches instead of forking even more the zlib used in Chromium.
Please see pull request at: https://github.com/madler/zlib/pull/251
Next candidate would be CRC32 (it can be made 7x to 10x faster by using the CRC32 instruction available in ARMv8).
https://bugs.chromium.org/p/chromium/issues/detail?id=709716
If it can be made faster just do it. Anyone could easily make great use of it being faster with performance increase.
zlib is both efficient and fast (not to mention insanely portable) and has provided great services for the world for the last 2 decades. It is used everywhere: Linux kernel, Chromium, Firefox, libpng, iOS, Android, etc.
We all should be grateful that it was made available for free by their authors.
One way to improve performance is by sacrificing portability (e.g. CPU specific code), which is a considerable cost and it is better to keep it contained in well separated functions/modules.
@timofonic zlib-ng has accepted the ARM specific optimizations, IIRC it is in a development branch.
Libpng has both intrinsics and hand written ASM code for ARM (on the pre-filters).
Would zlib be open to contributions of a few core/hot functions targeting ARM?
One good candidate we identified is Adler-32, a SIMD version is about 3x faster on ARMv8.