Closed Mort4lis closed 2 years ago
First of all, INT96 is deprecated, consider using something else if you can.
The problem is that INT96 is stored as string internally, even though it is not valid UTF8 string, so when Marshal
tries to serialize it to UTF8 string, it fails and populates Unicode replacement.
This is related to https://github.com/xitongsys/parquet-go/issues/434 and https://github.com/xitongsys/parquet-go/issues/321, both are problems caused by internal representation of []byte
as string
.
First of all, INT96 is deprecated, consider using something else if you can.
The problem is that INT96 is stored as string internally, even though it is not valid UTF8 string, so when
Marshal
tries to serialize it to UTF8 string, it fails and populates Unicode replacement.This is related to #434 and #321, both are problems caused by internal representation of
[]byte
asstring
.
Thank you for reply, man! Yes, indeed I store Julian date as a byte representation in INT96 column type. And these bytes are not Unicode code points.
Hi everyone! I have a problem with writing/reading parquet file.
Let's take a look at an example: I create a json writer and schema with one column (INT96) and try to write one row with current date. Before write I convert
time.Time
to string by callingtypes.TimeToINT96
. But after reading the output parquet file, I have got a wrong result.If I replace the
jsonWriter
to usualParquetWriter
then it works correctly, but I need to write json. I will be glad for any help!Code: