intel / isa-l_crypto

Other
267 stars 80 forks source link

Add xxHash algorithm family #110

Closed hzhuang1 closed 3 months ago

hzhuang1 commented 1 year ago

Add XXH32 and XXH64 of algorithm family. Implement them with aarch64 SVE instructions.

gbtucker commented 1 year ago

This is interesting but xxhash is very parallelable in the single buffer version also. Are you comparing with the NEON single buffer version? You may have to add XXH_VECTOR == XXH_NEON when building xxhash.

hzhuang1 commented 1 year ago

This is interesting but xxhash is very parallelable in the single buffer version also. Are you comparing with the NEON single buffer version? You may have to add XXH_VECTOR == XXH_NEON when building xxhash.

Screen Shot 2022-11-07 at 10 43 59 Screen Shot 2022-11-07 at 10 44 15

In cellphone SoC, maybe NEON could gain better performance than Scalar on single buffer version. In server SoC, I couldn't get the same result. I attached two pictures on the same test platform. I even got the similar result on another vendor's platform.

I think that you're talking about XXH3 algorithm. There's even performance downgrade on XXH3 with NEON. I'm in the process of upstreaming it in xxHash repository (https://github.com/Cyan4973/xxHash/issues/751). With the optimization on SVE, I could gain 4x improvement for the best case on the same test platform.

XXH32 and XXH64 algorithms are less parallelable. Since there's no multiple buffer framework in xxHash, I submit the related patches to ISA-L_crypto instead. And I only need to compare SVE and Scalar since NEON can't bring any benefit in server SoC.

hzhuang1 commented 1 year ago

Any comment on this pull request?

hzhuang1 commented 1 year ago

Could you help to share any comment to me? Thanks

gbtucker commented 1 year ago

So the SVE multi-buffer version is 3x-4x the single buffer version?

hzhuang1 commented 1 year ago

So the SVE multi-buffer version is 3x-4x the single buffer version?

Excuse me to response late.

Not exactly. On different machine, the improvement is also different. I think that it's caused by different micro-architecture. The first one is ARM-v8.2, and the latest one is ARM-v8.4. It evolves quickly.

On the SVE-512 test machine, it improved the performance in the range of 1.x-3.x. It's caused by different block size.

pablodelara commented 5 months ago

HI @hzhuang1. Apologies for this long wait time, we have been transitioning in our company and we are maintaining actively this library again. Are you still interested in pushing this? There are currently some code formatting issues (you can run tools/check_format.sh). Could you take a look at them? Thanks!

hzhuang1 commented 5 months ago

Thanks. I've updated the pull request.

pablodelara commented 4 months ago

Hi @hzhuang1, apologies for asking about format again, but given that we have had issues with the indent tool, we have transitioned to clang-format, so this PR will require a change. Can you run ./tools/format.sh (you will need clang-format-18)?

pablodelara commented 4 months ago

Hi @hzhuang1. Apologies for this late response. After thinking about this, I don't think this is the right place for this algorithm, given this is a non-cryptographic hash. I also saw that this is already in xxHash repo, so no need to include it here. Thanks!