prestodb / airlift

Airlift framework for building REST services
Apache License 2.0
1 stars 42 forks source link

Update HyperLogLog library to original Airlift #68

Closed pranjalssh closed 9 months ago

pranjalssh commented 9 months ago

There has been considerable performance improvements to original airlift library. Namely:

https://github.com/airlift/airlift/commit/a9328817b79f573d99b7a2d8d65c4b94d84f7a0c https://github.com/airlift/airlift/commit/23b97cbdd61b0a1aeb2a9516ee616ee08e7736cc

I copied all HyperLogLog files from airlift/airlift to prestodb/airlift - and made some API changes necessary for our internal PrivateLpcaSketch. API changes were "read-only" - so we don't have to worry about correctness issues in PrivateLpcaSketch:

DenseHll.eachBucket(BucketListener listener)
HyperLogLog.getMaxBucketValue()
HyperLogLog.getNumberOfBuckets()

Benchmark before and after showing 70X improvement for benchmarkMergeWithSparse which matches our expectations:

Benchmark                                   Mode  Cnt   Score   Error  Units
BenchmarkDenseHll.benchmarkInsert           avgt   50   1.441 ± 0.114  us/op
BenchmarkDenseHll.benchmarkMergeWithDense   avgt   50  21.630 ± 0.541  us/op
BenchmarkDenseHll.benchmarkMergeWithSparse  avgt   50  22.805 ± 0.227  us/op

Benchmark                                   Mode  Cnt   Score   Error  Units
BenchmarkDenseHll.benchmarkInsert           avgt   50   1.409 ± 0.035  us/op
BenchmarkDenseHll.benchmarkMergeWithDense   avgt   50  11.615 ± 1.016  us/op
BenchmarkDenseHll.benchmarkMergeWithSparse  avgt   50   0.335 ± 0.009  us/op

Commit 1: Update benchmarks and some tests(so we can compare) Commit 2: Ports over HyperLogLog changes

pranjalssh commented 9 months ago

I've run unit tests so far. Note that Presto will not use this airlift code until we release this and update the version in Presto's pom. Once we merge and release this we can run correctness suites before landing in presto

pranjalssh commented 9 months ago

Tried several queries to verify correctness and saw performance gains as well. Going to push now and figure out how to release later

agrawaldevesh commented 9 months ago

Just wow !! Would love to learn more about the performance improvement we saw on a real presto query with this change.