xitongsys / parquet-go

pure golang library for reading/writing parquet file
Apache License 2.0
1.27k stars 293 forks source link

Timestamp conversion failing? #596

Open jvilhuber opened 2 months ago

jvilhuber commented 2 months ago

I'm very likely doing something wrong. Could use a clue. I have a struct I'm using to translate from json to parquet files. I have not figured out how to use the same 'time.Time' field for both json and parquet, so I have these two fields:

    Timestamp         time.Time `json:"@timestamp"`
    ParquetTimestamp  int64     `parquet:"name=timestamp, type=INT64, convertedType=TIMESTAMP_MILLIS"`

I also tried

    ParquetTimestamp  int64     `parquet:"name=timestamp, type=INT64, logicaltype=TIMESTAMP, logicaltype.isadjustedtoutc=true, logicaltype.unit=MILLIS"`

The json file has something like

    "@timestamp": "2024-08-19T12:14:17+00:00",

Json decoding works fine and I have a valid, non-zero timestamp in go (verified with the debugger). In the code, I then do this:

            logEntry.ParquetTimestamp = logEntry.Timestamp.UnixMilli()

And then I write out the parquet using the normal mechanisms. I've verified that the value assigned to logEntry.ParquetTimestamp is not 0 at this point.

However it appears that I always wind up with a 0 timestamp in the parquet files. Using parquet cat to dump the file, I see

{"timestamp": 0...

Can anyone tell me what I'm doing wrong?

jvilhuber commented 2 months ago

If I investigate with parquet schema I see

$ parquet schema logfile.parquet
{
  "type" : "record",
  "name" : "parquet_go_root",
  "fields" : [ {
    "name" : "timestamp",
    "type" : {
      "type" : "long",
      "logicalType" : "local-timestamp-millis"
    }
  }, {...
hangxie commented 1 month ago

Works for me

package main

import (
    "fmt"
    "os"
    "time"

    "github.com/xitongsys/parquet-go/writer"
)

type Example struct {
    ParquetTimestamp int64 `parquet:"name=timestamp, type=INT64, convertedType=TIMESTAMP_MILLIS"`
}

func main() {
    w, err := os.Create("timestamp.parquet")
    if err != nil {
        fmt.Println("Can't create local file", err)
        return
    }

    pw, err := writer.NewParquetWriterFromWriter(w, new(Example), 4)
    if err != nil {
        fmt.Println("Can't create parquet writer", err)
        return
    }

    ex := Example{
        ParquetTimestamp: time.Now().UnixMilli(),
    }
    if err = pw.Write(ex); err != nil {
        fmt.Println("Write error", err)
    }
    if err = pw.WriteStop(); err != nil {
        fmt.Println("WriteStop error", err)
        return
    }
    fmt.Println("Write Finished")
    w.Close()

}

Then

$ go run timestamp.go
Write Finished
$ parquet cat timestamp.parquet
{"timestamp": 1727915144666}
$ parquet schema timestamp.parquet
{
  "type" : "record",
  "name" : "parquet_go_root",
  "fields" : [ {
    "name" : "timestamp",
    "type" : {
      "type" : "long",
      "logicalType" : "local-timestamp-millis"
    }
  } ]
}