segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 102 forks source link

Reuse reconstruction columns slice to reduce allocations #509

Closed zolstein closed 1 year ago

zolstein commented 1 year ago

Create an internal method on Schema to reconstruct a value, passing in a [][]Value to use as columns. Use this internal method, rather than Reconstruct, when reading rows in GenericReader.

Calling Reconstruct on every row being read, and constructing a new [][]Value, in aggregate, accounts for the majority of allocations while reading parquet files and induces an unnecessarily large GC overhead.


Test code used to identify issues test.parquet is a 768MB parquet file with 32M records. ```go type TestStruct struct { Field1 int64 Field2 int64 Field3 int64 } func main() { entries := make([]TestStruct, 1024) inFile, err := os.Open("test.parquet") if err != nil { log.Fatalf("failed to open parquet file: %v", err) } pr := parquet.NewGenericReader[TestStruct](inFile) for { _, err := pr.Read(entries) if err == io.EOF { break } else if err != nil { log.Fatalf("failed to read parquet entries: %v\n", err) } } f, err := os.Create("mem.pprof") if err != nil { log.Fatalf("failed to open file: %v", err) } defer f.Close() if err := pprof.Lookup("allocs").WriteTo(f, 0); err != nil { log.Fatalf("failed to write heap profile: %v", err) } } ```
Profile output (before change) ![profile016](https://github.com/segmentio/parquet-go/assets/7101542/810e36d4-0887-42ca-831c-573ea55da4e7)
Profile output (after change) ![profile014](https://github.com/segmentio/parquet-go/assets/7101542/db23e50f-eea9-460a-8c15-304a11bd77e6)
kevinburkesegment commented 1 year ago

Apologies to make more work for you, but we've decided to move development on this project to a new organization at https://github.com/parquet-go/parquet-go to ensure its long term success. We appreciate your contribution and would appreciate if you could reopen this PR there if it is still relevant.