Closed sadikovi closed 7 years ago
@sunchao I did not add unittests for this change in respect to reading DataPage V2. Can you advice on how I can write unittests to tests reading DataPage V2?
@sadikovi Thanks for the PR (as always)!
@sunchao I did not add unittests for this change in respect to reading DataPage V2. Can you advice on how I can write unittests to tests reading DataPage V2?
I would suggest to add two unit tests: one to test the read_new_page
added to the column reader, and another for get_next_page
in file/reader
.
For the former you may need to modify the existing DataPageBuilder
and LevelEncoder
(right now it always emit a 4-byte i32 at the beginning of buffer); for the latter, you can add a new Parquet test file under /data
and then just do some simple sanity test like the test_file_reader
in the file.
@sunchao Thanks for the review. I will address your comments and add unittests.
@sunchao could you review this pull request again? I added tests file/reader.rs and column/reader.rs.
Thanks!
@sunchao I addressed your comments - replaced all test functions in column/reader.rs
with macro calls. Can you review again? Thanks!
Merged. Thanks!
Thank you for the review and merge!
This PR adds support for DataPage V2.
I tested manually on a table that was written with version 2 writer. Refactored DataPage code into
set_current_page_encoding
, since it is the same for both V1 and V2. Also added method to set data range directly in RLE decoder, because DataPage V2 has repetition and definition levels separate from actual page data, which can be compressed.It looks like DataPage V2 encodes int32 values with Delta Bit Packing. Currently decoder supports only int64, but I patched it manually - it the same for int32. Will create separate pull request for that.