segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 102 forks source link

GenericWriter should write map keys to matching columns #510

Closed stoewer closed 1 year ago

stoewer commented 1 year ago

Given a parquet file with a schema generated from the following model:

type Inner struct {
    FieldB int
    FieldC string
}

type Model struct {
    FieldA string
    Nested Inner  
}

The GenericWriter should be able to write data from an an alternative model, where Nested is represented as map, as long as the map keys match column names from the original schema:

type AltModel struct {
    FieldA string
    Nested map[string]any
}

data := []AltModel{
    {
        FieldA: "a",
        Nested: map[string]any{"FieldB": 11, "FieldC": "c"},
    },
}

schema := parquet.SchemaOf(new(Model))
w := parquet.NewGenericWriter[AltModel](f, schema)
w.Write(data)

In its current implementation GenericWriter panics for the above code. Here is a gist that reproduces the error

This feature is useful when writing data to a schema where parts of the schema (i.e. the struct Inner) are defined dynamically at runtime.

kevinburkesegment commented 1 year ago

Apologies to make more work for you, but we've decided to move development on this project to a new organization at https://github.com/parquet-go/parquet-go to ensure its long term success. We appreciate your contribution and would appreciate if you could reopen this ticket there if it is still relevant.

stoewer commented 1 year ago

Thanks for letting me know @kevinburkesegment I copied the issue to parquet-go/parquet-go#8