jorgecarleitao / arrow2

Transmute-free Rust library to work with the Arrow format
Apache License 2.0
1.07k stars 220 forks source link

Error writing to Parquet: "Invalid argument error: The datatype Boolean cannot be encoded by Rle" #1383

Open mattbonnell opened 1 year ago

mattbonnell commented 1 year ago

I'm not able to use the Rle encoding on Boolean fields, as I get the following error at runtime:

ExternalFormat("File out of specification: External format error: File out of specification: Invalid argument error: The datatype Boolean cannot be encoded by Rle")

This is surprising, as the docstring for the Rle encoding says

    /// Group packed run length encoding. Usable for definition/repetition levels
    /// encoding and Booleans (on one bit: 0 is false; 1 is true.)

Is this expected?

jorgecarleitao commented 1 year ago

Thanks for the issue!

No - this is definitely a bug / not implemented. We are missing support for this case.

mattbonnell commented 1 year ago

Thanks for getting back. Got it. I notice that we have encode_bool implemented under hybrid-rle here in parquet2 https://github.com/jorgecarleitao/parquet2/blob/864ddc823a25a9c60ba487b2feb8479ce932800c/src/encoding/hybrid_rle/encoder.rs#L74-L93. What pieces are currently missing to enable this?