sunchao / parquet-rs

Apache Parquet implementation in Rust
Apache License 2.0
149 stars 20 forks source link

Support reading & writing Arrow in encoder/decoders. #191

Open sunchao opened 5 years ago

sunchao commented 5 years ago

In order to read into Arrow format, we need to add a get_spaced (borrowing from the c++ version) method in the decoder to leave spaces for null values, in the result value buffer. Same for encoders.

Subtasks:

sadikovi commented 5 years ago

We have similar thing in record reader.

sunchao commented 5 years ago

We have similar thing in record reader.

Hmm... you mean record/reader.rs? I couldn't find anything related. This is on the encoding level though - so we'll need to add a new method in Encoder and Decoder.

sadikovi commented 5 years ago

How will you add it to the encoder or decoder? They don’t have information about null values - they encode or decode non null values.

If I am not mistaken - https://github.com/sunchao/parquet-rs/blob/master/src/record/triplet.rs#L310

Let me know if this is not what you had in mind, I will delete my comments.

sunchao commented 5 years ago

The interface will be similar to here. The valid_bits will be computed from def/rep levels, and passed to the call. See here for an example.