jorgecarleitao / parquet2

Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow
Other
356 stars 59 forks source link

Copying CompressedPages to new file #224

Open little-arhat opened 1 year ago

little-arhat commented 1 year ago

Hello!

Thansk for this crate!

I'm writing concat tool to merge multiple parquet files into one. When reading CompressedPages with PageReader, I get back CompessedPages with .selected_rows = None.

When I try to write those pages, write_page expects Some(selected_rows), and my program ultimately fails with "All data pages must declare the number of rows on it" -- https://github.com/jorgecarleitao/parquet2/blob/7a5fc27039b192f255908154a0aba2e75f6ed5a1/src/write/row_group.rs#L69.

Is this hard requirement of parquet format? Can I pass 0 instead, or do I have to decompress/deser & ser/compress pages to copy them to new file?

little-arhat commented 1 year ago

Any tips?