xitongsys / parquet-go

pure golang library for reading/writing parquet file
Apache License 2.0
1.24k stars 294 forks source link

Timestamp conversion failing? #596

Open jvilhuber opened 3 weeks ago

jvilhuber commented 3 weeks ago

I'm very likely doing something wrong. Could use a clue. I have a struct I'm using to translate from json to parquet files. I have not figured out how to use the same 'time.Time' field for both json and parquet, so I have these two fields:

    Timestamp         time.Time `json:"@timestamp"`
    ParquetTimestamp  int64     `parquet:"name=timestamp, type=INT64, convertedType=TIMESTAMP_MILLIS"`

I also tried

    ParquetTimestamp  int64     `parquet:"name=timestamp, type=INT64, logicaltype=TIMESTAMP, logicaltype.isadjustedtoutc=true, logicaltype.unit=MILLIS"`

The json file has something like

    "@timestamp": "2024-08-19T12:14:17+00:00",

Json decoding works fine and I have a valid, non-zero timestamp in go (verified with the debugger). In the code, I then do this:

            logEntry.ParquetTimestamp = logEntry.Timestamp.UnixMilli()

And then I write out the parquet using the normal mechanisms. I've verified that the value assigned to logEntry.ParquetTimestamp is not 0 at this point.

However it appears that I always wind up with a 0 timestamp in the parquet files. Using parquet cat to dump the file, I see

{"timestamp": 0...

Can anyone tell me what I'm doing wrong?

jvilhuber commented 3 weeks ago

If I investigate with parquet schema I see

$ parquet schema logfile.parquet
  "type" : "record",
  "name" : "parquet_go_root",
  "fields" : [ {
    "name" : "timestamp",
    "type" : {
      "type" : "long",
      "logicalType" : "local-timestamp-millis"
  }, {...