jorgecarleitao / parquet2

Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow
Other
356 stars 59 forks source link

Add BYTE_STREAM_SPLIT encoder/decoder (fixes #208) #221

Closed AudriusButkevicius closed 1 year ago

AudriusButkevicius commented 1 year ago

Apologies, this is my first encounter with the rust type system, took a while to even get the decoder to compile and stay generic.

I am sure there could be less phantom data, random .as_ref()s on a separate lines or even .try_into() etc, but someone with more experience in rust type system needs to guide my hand.

Fixes #208

adamreeve commented 1 year ago

Hi @jorgecarleitao, can you please take another look at this? Or could you take a look @sundy-li?

We have a lot of parquet files using byte stream split encoding as this significantly improves compression for floating point data, and it would be great to be able to read them with polars.

adamreeve commented 1 year ago

Hi @ritchie46, do you have maintainer permissions here now and is this something you can take a look at? Or is the current plan to move to using the parquet crate in Polars instead (https://github.com/pola-rs/polars/issues/6735)?