xitongsys / parquet-go

pure golang library for reading/writing parquet file
Apache License 2.0
1.25k stars 294 forks source link

INT96 is convert to datetime #592

Open shiyuhang0 opened 2 months ago

shiyuhang0 commented 2 months ago

write with csv

func main() {
    var err error
    md := []string{
        "name=id, type=INT96",
        "name=name, type=BYTE_ARRAY",
    }

    fw, err := local.NewLocalFileWriter("csv.parquet")
    if err != nil {
        log.Println("Can't open file", err)
        return
    }
    pw, err := writer.NewCSVWriter(md, fw, 4)
    if err != nil {
        log.Println("Can't create csv writer", err)
        return
    }

    data := []string{"18446744073709551615", "b"}
    rec := make([]*string, len(data))
    for j := 0; j < len(data); j++ {
        rec[j] = &data[j]
    }
    if err = pw.WriteString(rec); err != nil {
        log.Println("WriteString error", err)
    }

    if err = pw.WriteStop(); err != nil {
        log.Println("WriteStop error", err)
    }
    for _, s := range pw.Footer.Schema {
        println(fmt.Sprintf("%v", *s))
    }
    log.Println("Write Finished")
    fw.Close()
}

read with duckdb

id,name
"4714-11-24 (BC) 00:00:00",b

id become something like datetime, is it reasonable?

hangxie commented 5 days ago

First INT96 was deprecated more than 6 years ago https://github.com/apache/parquet-format/blob/master/CHANGES.md#version-250, you may consider move to something else.

I feel like some tools/libraries don't interpret INT96 properly, I run your code and used online parquet viewers got different results:

  1. https://www.parquet-viewer.com/ gives 1717-12-28 19:20:10.805067775
  2. https://dataconverter.io/view/parquet/ gives 4713-01-01T11:59:59.999Z
  3. https://parquetreader.com/result gives 1970-01-01

Various CLI tools also returned different result, if you expect 1717-12-28 19:20:10.805067775 you may want to try the tool i built:

$ parquet-tools cat csv.parquet
[{"Id":"1717-12-28T19:20:10.805067776Z","Name":"b"}]