sunchao / parquet-rs

Apache Parquet implementation in Rust
Apache License 2.0
149 stars 20 forks source link

Add reader properties #161

Closed sadikovi closed 5 years ago

sadikovi commented 5 years ago

This PR adds ReaderProperties struct, similar to WriterProperties used for writing. Currently this struct only contains batch_size, which is used to be hard-coded.

Following is done:

This is a fairly big change and has implications of incompatibility of API with prior versions.

sadikovi commented 5 years ago

@sunchao Could you review this PR and let me know if we should keep the change or postpone it for now? This is one of the (optional?) items on write-support issue. Thanks!

coveralls commented 5 years ago

Pull Request Test Coverage Report for Build 620


Files with Coverage Reduction New Missed Lines %
file/writer.rs 7 95.93%
file/reader.rs 14 97.08%
schema/printer.rs 20 70.52%
file/properties.rs 23 92.56%
record/reader.rs 83 88.61%
<!-- Total: 147 -->
Totals Coverage Status
Change from base Build 619: 0.04%
Covered Lines: 12460
Relevant Lines: 13035

💛 - Coveralls
sunchao commented 5 years ago

Thanks @sadikovi ! Just curious: do you have in mind what other reader properties we will add in future? I checked parquet-cpp/parquet-mr and didn't find many properties that people need to tune on the reader side.

sadikovi commented 5 years ago

Actually, not. The only setting that we have and that one is used in record reader (which could be deprecated in the future) is batch size.

What I can do is simply bumping up the batch size to 1024/4096 instead of introducing the whole reader properties and changing API.

Let me know what you think.

sunchao commented 5 years ago

What I can do is simply bumping up the batch size to 1024/4096 instead of introducing the whole reader properties and changing API.

Yes I'd prefer this approach. Thanks!