sunchao / parquet-rs

Apache Parquet implementation in Rust
Apache License 2.0
149 stars 20 forks source link

Support for DataPage V2 #16

Closed sadikovi closed 7 years ago

sadikovi commented 7 years ago

This PR adds support for DataPage V2.

I tested manually on a table that was written with version 2 writer. Refactored DataPage code into set_current_page_encoding, since it is the same for both V1 and V2. Also added method to set data range directly in RLE decoder, because DataPage V2 has repetition and definition levels separate from actual page data, which can be compressed.

It looks like DataPage V2 encodes int32 values with Delta Bit Packing. Currently decoder supports only int64, but I patched it manually - it the same for int32. Will create separate pull request for that.

sadikovi commented 7 years ago

@sunchao I did not add unittests for this change in respect to reading DataPage V2. Can you advice on how I can write unittests to tests reading DataPage V2?

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.4%) to 85.109% when pulling 8079effddd4beec3ffbed496a4ee5101b27a5b59 on sadikovi:datapagev2 into c2c185791050ff7ce6ba1779062854fff4e42ab5 on sunchao:master.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.4%) to 85.109% when pulling 8079effddd4beec3ffbed496a4ee5101b27a5b59 on sadikovi:datapagev2 into c2c185791050ff7ce6ba1779062854fff4e42ab5 on sunchao:master.

sunchao commented 7 years ago

@sadikovi Thanks for the PR (as always)!

@sunchao I did not add unittests for this change in respect to reading DataPage V2. Can you advice on how I can write unittests to tests reading DataPage V2?

I would suggest to add two unit tests: one to test the read_new_page added to the column reader, and another for get_next_page in file/reader.

For the former you may need to modify the existing DataPageBuilder and LevelEncoder (right now it always emit a 4-byte i32 at the beginning of buffer); for the latter, you can add a new Parquet test file under /data and then just do some simple sanity test like the test_file_reader in the file.

sadikovi commented 7 years ago

@sunchao Thanks for the review. I will address your comments and add unittests.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.3%) to 85.969% when pulling 67e03cbf1a920911b9e6a0149d0ab10cccdbab68 on sadikovi:datapagev2 into a533a76a4660464a0e64d4c80c57eba1754267ed on sunchao:master.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+1.3%) to 86.918% when pulling b6ef7577ea4dbd9a977b4c1331e468047f6a263e on sadikovi:datapagev2 into a533a76a4660464a0e64d4c80c57eba1754267ed on sunchao:master.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+1.3%) to 86.918% when pulling b6ef7577ea4dbd9a977b4c1331e468047f6a263e on sadikovi:datapagev2 into a533a76a4660464a0e64d4c80c57eba1754267ed on sunchao:master.

sadikovi commented 7 years ago

@sunchao could you review this pull request again? I added tests file/reader.rs and column/reader.rs.

Thanks!

sadikovi commented 7 years ago

@sunchao I addressed your comments - replaced all test functions in column/reader.rs with macro calls. Can you review again? Thanks!

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.9%) to 86.6% when pulling f4247f09887370a27b311d2a9453be659a893ffb on sadikovi:datapagev2 into a533a76a4660464a0e64d4c80c57eba1754267ed on sunchao:master.

sunchao commented 7 years ago

Merged. Thanks!

sadikovi commented 7 years ago

Thank you for the review and merge!