segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 58 forks source link

better estimation of the size of decode output buffers #434

Closed achille-roussel closed 1 year ago

achille-roussel commented 1 year ago

This PR adds a new method to parquet.Type named EstimateDecodeSize which is similar in functionality to EstimateSize but provides more precise results when working with pages of variable size values (e.g. BYTE_ARRAY).

The use of this new method allows us to better size the output buffer where pages are decode, which greatly reduces memory pressure by avoiding reallocation of the buffer since the one that was selected to decode BYTE_ARRAY pages was almost always too small.