xitongsys / parquet-go

pure golang library for reading/writing parquet file
Apache License 2.0
1.27k stars 293 forks source link

How to convert parquet INT94 to TIMESTAMP ? #409

Closed fllife closed 3 years ago

fllife commented 3 years ago
  1. Question, my parquet file schema has a INT96 type, and I need to convert INT96 to timestamp, but when I read INT96 filed from file , the field string is messy code like �̌r)T�%,so timestamp accuracy is lost after converted like 1925-01-01 00:12:21.610

  2. CODE:My main code is as follows:

        res, err := pr.ReadByNumber(step)
        if err != nil {
            appInit.Logger.Errorf("Can't read: %s", err)
            return err
        }
        jsonBs, err := json.Marshal(res)
        if err != nil {
            appInit.Logger.Errorf("Can't to json: %s", err)
            return err
        }
        mp := make([]map[string]interface{}, step)
        json.Unmarshal(jsonBs, &mp)
    
                 // get timestamp accuracy is lost when invoke ‘types.INT96ToTime(value.(string)’   in GetValue
                rows := make([]chHttpClient.Row, 0, len(mp))
        for i, _ := range mp {
            row := make([]interface{}, 0, len(clmList))
            for j, item := range clmList {
                row = append(row, utils.GetValue(mp[i][item.Nm], clmList[j]))
            }
            rows = append(rows, row)
        }
  3. so someone can tell me how can I read data to avoid messy code or how can I convert INT96 from parquet? thanks

hangxie commented 3 years ago

https://www.google.com/search?q=parquet+int96+julian

EDIT shorter URL

hangxie commented 3 years ago

Also this is duplicated to #408

xitongsys commented 3 years ago

type.INT64ToTime is compatible with Spark output. For details you can reference here.

You should check the encoding type in your INT96.

RichardFlyBird commented 3 years ago

type.INT64ToTime is compatible with Spark output. For details you can reference here.

You should check the encoding type in your INT96.

Maybe encoding type is wrong? how should i check the encoding type in my INT96?

hangxie commented 3 years ago

Maybe encoding type is wrong? how should i check the encoding type in my INT96?

If you don't want to write code, you can use a parquet-tools to dump schema of the parquet file, like the one in this repo, or the python one, or the Java one, or mine :).