pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.57k stars 1.98k forks source link

perf: Half the size of Booleans in row encoding #19927

Closed coastalwhite closed 5 days ago

coastalwhite commented 6 days ago

This changes the encoding of pl.Boolean in the row encoding from needing 2 bytes (1 for validity, 1 for value) to 1 byte. Now, the encoding of Boolean values is as follows:

Value Encoding
None 0x00
False 0x20
True 0x30

The null is bitwise inverted for nulls_last=True and False / True are inverted for descending=True.

This is a continuation of #19874.

codecov[bot] commented 5 days ago

Codecov Report

Attention: Patch coverage is 58.53659% with 34 lines in your changes missing coverage. Please review.

Project coverage is 59.28%. Comparing base (414d883) to head (7615e2c). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-row/src/fixed.rs 40.38% 31 Missing :warning:
crates/polars-row/src/row.rs 80.00% 2 Missing :warning:
crates/polars-row/src/variable.rs 50.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #19927 +/- ## ======================================= Coverage 59.28% 59.28% ======================================= Files 1555 1555 Lines 216180 216213 +33 Branches 2456 2456 ======================================= + Hits 128155 128177 +22 - Misses 87467 87478 +11 Partials 558 558 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.


🚨 Try these New Features: