I'd like to see INT96 (though deprecated), BYTE_ARRAY and FIXED_LEN_BYTE_ARRAY all use []byte as primitive type (https://github.com/xitongsys/parquet-go#type), I mean, they are bytes, not necessary to be valid UTF8 string.
I have a use case to print data from parquet without predefined schema in JSON format, the easiest way is to read schema from parquet, read data from parquet, transform data based on schema whenever needed (eg Decimal type), all in JSON world. However, this is not working as BYTE_ARRAY or FIXED_LEN_BYTE_ARRAY for decimal type will most likely to be invalid UTF8 string so I got lots of U+FFFD (replacement for invalid UTF char) in JSON, the conversation cannot be reversed.
I'd like to see INT96 (though deprecated), BYTE_ARRAY and FIXED_LEN_BYTE_ARRAY all use
[]byte
as primitive type (https://github.com/xitongsys/parquet-go#type), I mean, they are bytes, not necessary to be valid UTF8 string.I have a use case to print data from parquet without predefined schema in JSON format, the easiest way is to read schema from parquet, read data from parquet, transform data based on schema whenever needed (eg Decimal type), all in JSON world. However, this is not working as BYTE_ARRAY or FIXED_LEN_BYTE_ARRAY for decimal type will most likely to be invalid UTF8 string so I got lots of U+FFFD (replacement for invalid UTF char) in JSON, the conversation cannot be reversed.