Open Nugine opened 1 week ago
Attention: Patch coverage is 52.80899%
with 42 lines
in your changes missing coverage. Please review.
Project coverage is 70.70%. Comparing base (
2df56d8
) to head (a6136ca
). Report is 10 commits behind head on unstable.
Files with missing lines | Patch % | Lines |
---|---|---|
src/hyperloglog.c | 52.80% | 42 Missing :warning: |
How was this change tested, particularly to confirm that the non-AVX2 and AVX2 implementations produce the same results?
How was this change tested, particularly to confirm that the non-AVX2 and AVX2 implementations produce the same results?
The algorithms are verified by comparing the results between scalar code and simd code with random input. https://github.com/Nugine/redis-hyperloglog/blob/main/cpp/bench.cpp
This is so cool! :sunglasses:
How was this change tested, particularly to confirm that the non-AVX2 and AVX2 implementations produce the same results?
The algorithms are verified by comparing the results between scalar code and simd code with random input. https://github.com/Nugine/redis-hyperloglog/blob/main/cpp/bench.cpp
I think we need to test this in our repo in some way. The binary representation can't change, because things like replicas will not understand it, so we should verify the binary representation.
A hyperloglog key is actually a string so we can use the GET command to get the binary representation. In a TCL test case, we can use GET and compare the reply to the binary data we have stored earlier. Alternatively we can use DUMP. Can you add it?
@lipzhu You're the performance expert. Do you want to review this?
WIP: unit tests for verifying the results produced by non-AVX2 and AVX2 implementations.
I have other things to do these days. So it may take a while.
WIP: unit tests
This PR optimizes the performance of HyperLogLog commands (PFCOUNT, PFMERGE) by adding AVX2 fast paths.
Two AVX2 functions are added for conversion between raw representation and dense representation. They are 15 ~ 30 times faster than scalar implementaion. Note that sparse representation is not accelerated.
AVX2 fast paths are enabled when the CPU supports AVX2 (checked at runtime) and the hyperloglog configuration is default (HLL_REGISTERS == 16384 && HLL_BITS == 6).
When merging 3 dense hll structures, the benchmark shows a 12x speedup compared to the scalar version.
Experiment repo: https://github.com/Nugine/redis-hyperloglog Benchmark script: https://github.com/Nugine/redis-hyperloglog/blob/main/scripts/memtier.sh Algorithm: https://github.com/Nugine/redis-hyperloglog/blob/main/cpp/bench.cpp