pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
27.22k stars 1.67k forks source link

perf: Faster bitpacking for Parquet writer #16278

Closed thalassemia closed 1 month ago

thalassemia commented 1 month ago

I replaced the Parquet bit-packing algorithm with a modified version of the scalar algorithm from https://github.com/quickwit-oss/bitpacking (unsure where/how to give credit). I ensured the new algorithm works by adding unit tests that encode/decode random data.

I added a microbenchmark in my first commit to help measure the performance gain.

Before After
Avg. (ns/iter) 93.75 4.25
Std. (ns/iter) 1.57 0.04

I removed the microbenchmark in a later commit because it requires nightly Rust and is probably not useful outside of this PR.

Other miscellaneous fixes:

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 81.34%. Comparing base (11fe9d8) to head (b8ff88d). Report is 18 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #16278 +/- ## ========================================== + Coverage 80.80% 81.34% +0.53% ========================================== Files 1393 1403 +10 Lines 179406 183257 +3851 Branches 2921 2922 +1 ========================================== + Hits 144971 149063 +4092 + Misses 33932 33691 -241 Partials 503 503 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ritchie46 commented 1 month ago

Nice improvement! Thanks a lot @thalassemia. :raised_hands: Hope you find some on the reading side as well. ;)