Closed sadikovi closed 6 years ago
I also found that there is a bit of inconsistency between parquet-cpp and parquet-mr regarding FIXED_LEN_BYTE_ARRAY
having length as 0. Is it correct to have length 0? Currently we allow this, and I personally think it is okay, it is just indicates that array is empty, but would love to hear other opinions!
@sunchao Could you review this PR when you have time? Thanks!
Files with Coverage Reduction | New Missed Lines | % | ||
---|---|---|---|---|
encodings/encoding.rs | 1 | 94.64% | ||
schema/types.rs | 6 | 96.68% | ||
schema/parser.rs | 10 | 96.04% | ||
<!-- | Total: | 17 | --> |
Totals | |
---|---|
Change from base Build 533: | 0.03% |
Covered Lines: | 10544 |
Relevant Lines: | 11067 |
Is it correct to have length 0?
Yes I think it is correct too. Thanks for pointing out!
Merged. Thanks @sadikovi .
This PR adds
to_thrift
conversion, so we can serialise schema intoparquet::SchemaElement
. I added tests to verify that applyingto_thrift
->from_thrift
returns exactly the same schema.Several things have been found and fixed in the process:
Return type in
from_thrift
I changed parameter type for
from_thrift
from&mut [SchemaElement]
to&[SchemaElement]
, since there is no need to modify the slice.Inconsistencies when setting undefined precision, scale, length
Inconsistencies with respect to precision, scale and length having 0 or -1 to indicate unset/undefined value. In some parts of the code we were passing 0, in others we were setting -1. I propose that we treat -1 for precision, scale, and length as undefined value, and 0 as a valid value (of course, it depends, for example precision must always be positive). This means that when we use builder and set physical type as
FIXED_LEN_BYTE_ARRAY
we have to provide length; when we use builder and set logical type asDECIMAL
we have to provide precision and scale, even if it is 0. Note that parser still handles the case when decimal definition does not have scale value, because it is considered to be optional for schema string.