segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 58 forks source link

How to write with nested struct #377

Closed yorsita closed 2 years ago

yorsita commented 2 years ago

Hi Team, I'm new to this package and sorry to bother. I was trying to write a series of data with nested struct like

type People struct {
    Name string
    Age int
}

type Nested struct {
    P []People
    F string
    GF string
}

and here are two issues I met:

  1. how write with nested struct ? I tried to write with the following code but failed with "read error: EOF":

    func main() {
    type People struct {
        Name string
        Age  int
    }
    
    type Nested struct {
        P  []People
        F  string
        GF string
    }
    row1 := Nested{P: []People{
        {
            Name: "Bob",
            Age:  10,
        }}}
    ods := []Nested{
        row1,
    }
    buf := new(bytes.Buffer)
    w := parquet.NewGenericWriter[Nested](buf)
    _, err := w.Write(ods)
    if err != nil {
        log.Fatal("write error: ", err)
    }
    w.Close()
    
    file := bytes.NewReader(buf.Bytes())
    rows, err := parquet.Read[Nested](file, file.Size())
    if err != nil {
        log.Fatal("read error: ", err)
    }
    
    for _, row := range rows {
        fmt.Printf("%q\n", row)
    }
    }
  2. can segmentio/parquet-go write partial fields to parquet according to a new struct? It was like I have stored a slice of data with struct

    type Nested struct {
    P []People
    F string
    GF string
    }

    and when I write to parquet I just want to store

    type Nested struct {
    P []People
    F string
    // GF string         ignore GF field
    }

sorry to bother and wondering if there is a example doc ? Thank you so much

achille-roussel commented 2 years ago

Hello @yorsita

how write with nested struct ? I tried to write with the following code but failed with "read error: EOF":

I submitted this patch to address the issue: https://github.com/segmentio/parquet-go/pull/378

can segmentio/parquet-go write partial fields to parquet according to a new struct? It was like I have stored a slice of data with struct

You should be able to use struct tags to achieve this:

type Nested struct {
    P []People
    F string
    GF string `parquet:"-"` // omit the field
}