Open jaychia opened 1 year ago
This PR adds support for Parquet (variable-length) ByteArray
I don't understand. Why would decimal be encoded in variable length binary?
Patch coverage has no change and project coverage change: -0.05%
:warning:
Comparison is base (
87ab844
) 83.02% compared to head (ab04856
) 82.98%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hi @ritchie46, apologies for the late reply!
Going by the Parquet spec, decimals are actually able to be encoded as int32
, int64
, fixed_len_byte_array
and binary
.
See: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal
binary
: precision is not limited, but is required. The minimum number of bytes to store the unscaled value should be used.
Also need to impl for nested https://github.com/jorgecarleitao/arrow2/blob/main/src/io/parquet/read/deserialize/nested.rs
Arrow2 already has support for Parquet FixedLenByteArray -> Decimal conversion
This PR adds support for Parquet (variable-length) ByteArray -> Decimal conversion, re-using most of the logic from FixedLenByteArray conversion