xitongsys / parquet-go

pure golang library for reading/writing parquet file
Apache License 2.0
1.27k stars 293 forks source link

Unable to add logicalType to my json schema #556

Open vishwaratna opened 1 year ago

vishwaratna commented 1 year ago

Hi There,

I have developed a compact application that enables the conversion of JSON to Parquet file. I am seeking guidance on how to incorporate logicalType into the schema of my application. While I can use convertedType, the predecessor of logicalType, based on the examples and use-cases provided in the library, I would like to adhere to the latest standards by utilizing logicalType. Can you please advise on how to proceed or provide suggestions?

Below is my sample code using logicalType and it does not works, is there any way to define logicalType in my json schema.

package functions

import (
    "fmt"
    "github.com/xitongsys/parquet-go-source/local"
    "github.com/xitongsys/parquet-go/parquet"
    "github.com/xitongsys/parquet-go/writer"
    "log"
)

type ProcessData struct {
    EventTimestamp string `json:"event_timestamp"`
    ActionName     string `json:"action_name"`
    SystemName     string `json:"system_name"`
}

var jsonSchema string = `{
        "Tag": "name=parquet_go_root, repetitiontype=REQUIRED",
        "Fields": [
          {"Tag": "name=actionName, inname=ActionName, type=BYTE_ARRAY, logicaltype=STRING, repetitiontype=REQUIRED"},
          {"Tag": "name=systemName, inname=SystemName, type=BYTE_ARRAY, logicaltype=STRING, repetitiontype=REQUIRED"},
          {"Tag": "name=eventTimestamp, inname=EventTimestamp, type=BYTE_ARRAY, logicaltype=STRING, repetitiontype=REQUIRED"}
        ]
      }`

func ConvertToParquet() {
    var err error
    fw, err := local.NewLocalFileWriter("./json_schema.parquet")
    if err != nil {
        log.Println("Can't create local file", err)
        return
    }

    //write
    pw, err := writer.NewParquetWriter(fw, jsonSchema, 4)
    if err != nil {
        log.Println("Can't create parquet writer", err)
        return
    }

    pw.RowGroupSize = 128 * 1024 * 1024 //128M
    pw.CompressionType = parquet.CompressionCodec_SNAPPY
    num := 10
    for i := 0; i < num; i++ {
        stu := ProcessData{
            //some data
        }
        if err = pw.Write(stu); err != nil {
            fmt.Println("line 92")
            log.Println("Write error", err)
        }
    }
    if err = pw.WriteStop(); err != nil {
        log.Println("WriteStop error", err)
        return
    }
    log.Println("Write Finished")
    fw.Close()
}