segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 102 forks source link

Row group statistics #485

Closed suremarc closed 1 year ago

suremarc commented 1 year ago

See #410. This PR adds row group statistics, which will enable row-group level predicate pushdown.

The tests mostly pass with the changes I added, only that the column encodings seem to have a nondeterministic order. I had to build parquet-tools from scratch, so I'm not sure if this was just me building the wrong version or something.

suremarc commented 1 year ago

Thanks for the contribution! These tests aren't hooked up in Github Actions b/c no parquet-writer is installed so I had to run them locally. I saw your note about non-deterministic encoding order. Since I can only verify locally for these, if you make the suggested changes I can merge this in!

Thanks! I made the requested changes. LMK if anything still looks off.