Closed manticore-projects closed 11 months ago
The Adler
and CRC32
classes could be taken from zlib-ng which features AVX/SSE and NEON:
Support for CPU intrinsics when available
Adler32 implementation using SSSE3, AVX2, AVX512, AVX512-VNNI, Neon, VMX & VSX
CRC32-B implementation using PCLMULQDQ, VPCLMULQDQ, ACLE, & IBM Z
Hash table implementation using CRC32-C intrinsics on x86 and ARM
Slide hash implementations using SSE2, AVX2, ARMv6, Neon, VMX & VSX
Compare256 implementations using SSE2, AVX2, Neon, POWER9 & RVV
Inflate chunk copying using SSE2, SSSE3, AVX, Neon & VSX
Adding support for other architectures has been on my todo list for a while, but I haven't managed to find the time to do it yet...
I was thinking of re-using a similar approach to what I did for https://github.com/libjxl/libjxl/blob/main/lib/jxl/enc_fast_lossless.cc.
I am of course willing to review PRs though :)
First small achievement: Simulate an AARCH64 on a X86 host, please follow this discussion if your are interested. I will keep posting any progress there.
Greetings.
The code does not compile on AARCH64 since the SSE/AVX intrinsics would depend on NEON. Example:
I have setup a working Github pipeline for compiling and testing this
FPNGe
on AARCH. But I am not a CPP programmer, would you be able and willing to help me when I have questions on the porting?