segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 103 forks source link

Do not alloc page offset buffer when dictionary encoded #398

Closed mdisibio closed 1 year ago

mdisibio commented 1 year ago

This PR avoids unnecessary allocation of page offset buffers when the column is dictionary-encoded. Fixes #372

Wasn't sure if there is an appropriate benchmark to compare within parquet-go but here is a comparison using an upstream Grafana Tempo benchmark:

name                                      old time/op    new time/op    delta
BackendBlockSearchTraces/noMatch-12          204ms ± 6%     153ms ±11%  -25.21%  (p=0.000 n=9+10)
BackendBlockSearchTraces/partialMatch-12     2.57s ± 2%     2.50s ± 3%   -2.72%  (p=0.000 n=9+9)
BackendBlockSearchTraces/service.name-12    1.57ms ±12%    1.46ms ±36%     ~     (p=0.447 n=9+10)

name                                      old alloc/op   new alloc/op   delta
BackendBlockSearchTraces/noMatch-12          342MB ± 4%      48MB ±21%  -86.00%  (p=0.000 n=9+10)
BackendBlockSearchTraces/partialMatch-12     601MB ± 1%     307MB ± 4%  -48.94%  (p=0.000 n=9+9)
BackendBlockSearchTraces/service.name-12    3.67MB ± 1%    1.55MB ± 4%  -57.76%  (p=0.000 n=8+10)

name                                      old allocs/op  new allocs/op  delta
BackendBlockSearchTraces/noMatch-12           119k ± 1%       86k ± 6%  -27.54%  (p=0.000 n=8+10)
BackendBlockSearchTraces/partialMatch-12     18.0M ± 0%     17.9M ± 0%   -0.15%  (p=0.000 n=8+9)
BackendBlockSearchTraces/service.name-12     39.2k ± 0%     39.2k ± 0%   -0.15%  (p=0.000 n=10+10)