segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 102 forks source link

List-type columns should able to write null in parquet file #513

Open emlynazuma opened 1 year ago

emlynazuma commented 1 year ago

Descriptions:

We have a program that read json file, do some operation, then write to a new parquet file. Some problems happened when there are list-type columns.

Originally, this list-type may have a null value in the json source(e.g.{"list":null}), we want the parquet file to remain this characteristic(remain null if the json source is null). However, we cannot achieve this using the current parquet struct tag this library provided.

type RowType1 struct {
    ListTag []int32 `json:"list_tag" parquet:"list_tag,list"`
}

func main() {
    var rows []RowType1
    strs := []string{
        `{}`,
        `{"list_tag":null}`,
        `{"list_tag":[]}`,
        `{"list_tag":[1,2]}`,
    }
    for _, s := range strs {
        var r RowType1
        json.Unmarshal([]byte(s), &r)
        rows = append(rows, r)
    }
    if err := parquet.WriteFile("file.parquet", rows); err != nil {
        log.Fatalln("error")
    }
}

//The result is printed by `pqrs cat file.parquet`
// {list_tag: []} -> expect {list_tag: null}
// {list_tag: []} -> expect {list_tag: null}
// {list_tag: []}
// {list_tag: [1, 2]}

I have tried the following methods:

Expected Result:

Is there any possibility to achieve it?

kevinburkesegment commented 1 year ago

Apologies to make more work for you, but we've decided to move development on this project to a new organization at https://github.com/parquet-go/parquet-go to ensure its long term success. We appreciate your contribution and would appreciate if you could reopen this ticket there if it is still relevant.

tongwaiazuma commented 1 year ago

Rerun the code in https://github.com/parquet-go/parquet-go, same issue happened. I will reopen a ticket in the new library.