Open mihaitodor opened 2 years ago
This issue seems to also cause Pandas to fail with "OSError: Not yet implemented: DecodeArrow for DeltaLengthByteArrayDecoder.": https://github.com/segmentio/parquet-go/issues/325
I've added a UTF8 option for column values: https://github.com/benthosdev/benthos/commit/07ed81b150778a362e25e52428c59a05ca21369b as a quick work around. Technically I think we ought to be exposing logical types with a seperate field but we can cross that bridge later.
In some cases, users will need to specify the logical type in the
schema
field. Details here: https://github.com/apache/parquet-format/blob/master/LogicalTypes.mdFor example, when using
type: BYTE_ARRAY
to encode a string value, they might want to set the logical type toSTRING
so decoders will be able to interpret it correctly. For example, given this config:will produce a parquet binary which, when decoded with parquet-tools will contain a base64-encoded value:
however, if we change this line of code to
n = parquet.String()
, then parquet-tools will outputtest = deadbeef
.