segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 102 forks source link

Reuse thrift pageheader buffers #484

Closed mdisibio closed 1 year ago

mdisibio commented 1 year ago

This PR pools the format.PageHeader buffers used when decoding pages. For applications that scan many pages this offers significant memory improvements. Not sure which is the most appropriate benchmark in this repo to highlight the difference, but for some Tempo search patterns it's very significant.

Example:

name                                     old time/op    new time/op    delta
BackendBlockTraceQL/mixedNameNoMatch-12     2.46s ± 2%     2.17s ± 1%  -11.84%  (p=0.000 n=9+8)

name                                     old speed      new speed      delta
BackendBlockTraceQL/mixedNameNoMatch-12  10.6MB/s ± 2%  12.0MB/s ± 1%  +13.43%  (p=0.000 n=9+8)

name                                     old alloc/op   new alloc/op   delta
BackendBlockTraceQL/mixedNameNoMatch-12     422MB ± 3%      25MB ± 8%  -94.01%  (p=0.000 n=10+8)

name                                     old allocs/op  new allocs/op  delta
BackendBlockTraceQL/mixedNameNoMatch-12     6.29M ± 0%     0.00M ± 1%  -99.97%  (p=0.000 n=9+8)