Closed pranjalssh closed 9 months ago
I've run unit tests so far. Note that Presto will not use this airlift code until we release this and update the version in Presto's pom. Once we merge and release this we can run correctness suites before landing in presto
Tried several queries to verify correctness and saw performance gains as well. Going to push now and figure out how to release later
Just wow !! Would love to learn more about the performance improvement we saw on a real presto query with this change.
There has been considerable performance improvements to original airlift library. Namely:
https://github.com/airlift/airlift/commit/a9328817b79f573d99b7a2d8d65c4b94d84f7a0c https://github.com/airlift/airlift/commit/23b97cbdd61b0a1aeb2a9516ee616ee08e7736cc
I copied all HyperLogLog files from airlift/airlift to prestodb/airlift - and made some API changes necessary for our internal PrivateLpcaSketch. API changes were "read-only" - so we don't have to worry about correctness issues in PrivateLpcaSketch:
Benchmark before and after showing 70X improvement for
benchmarkMergeWithSparse
which matches our expectations:Commit 1: Update benchmarks and some tests(so we can compare) Commit 2: Ports over HyperLogLog changes