Closed weijietong closed 2 years ago
To be clear, at my test, the bloom filter hash value test totally elapsed 5.5s , the vptest
instruction code takes 4.94s, the vptest
occupies the largest performance.
How large was your filter? What processor were you using?
I runed the tpch-300 Q7, the vtune result is captured by running the Q7 circularly.The result shows that the chocke point is the vpptest
. The largest filter size is 1048576(calculated by 1ull << (_log_num_buckets + LOG_BUCKET_BYTE_SIZE)
). My processor is intel skylake. The cpu cache size is 33792 KB.
Yes, I think pre-fetching might help, if you're doing a lot of lookups in bulk. I can take that as a feature request.
What DBMS are you using?
alicloud's hologres
I've tested out prefetching and I can't seem to get a performance benefit on my machine from filters as small as 1MiB, though it's close for larger filters.
@jbapple https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_testc_si256&ig_expand=7216 the doc shows that vptest
has a high latency at the skylake architecture cpu.
what about replace it with two operations: a bitwise-and operation and then a popcount operation
I implemented the prefetch logic,got a minor performance improvement.
I implemented the prefetch logic,got a minor performance improvement.
Great! Can you show me how you did it, since our results differed? I'm wondering if I did something slow.
I am pleasure to share. But the prefetch is implemented outside this lib and is related to my own codes. Basic logic is to hold 32 number of hash values which are prefetched first(a prefetch is to prefetch a bucket), then to do the actual test of the held hash values. If you share what you wrote ,I can point it out.
@jbapple I test the blocked bloom filter (which is https://github.com/FastFilter/fastfilter_cpp/blob/master/src/bloom/simd-block.h version) in my codes. The vtune result shows that
Find
avx2 implementation's _mm256_testc_si256 opcode's performance is bad.So, do we have other tricks to improve it ? Maybe a prefetch will help ?