Closed mdisibio closed 1 year ago
This PR avoids unnecessary allocation of page offset buffers when the column is dictionary-encoded. Fixes #372
Wasn't sure if there is an appropriate benchmark to compare within parquet-go but here is a comparison using an upstream Grafana Tempo benchmark:
parquet-go
name old time/op new time/op delta BackendBlockSearchTraces/noMatch-12 204ms ± 6% 153ms ±11% -25.21% (p=0.000 n=9+10) BackendBlockSearchTraces/partialMatch-12 2.57s ± 2% 2.50s ± 3% -2.72% (p=0.000 n=9+9) BackendBlockSearchTraces/service.name-12 1.57ms ±12% 1.46ms ±36% ~ (p=0.447 n=9+10) name old alloc/op new alloc/op delta BackendBlockSearchTraces/noMatch-12 342MB ± 4% 48MB ±21% -86.00% (p=0.000 n=9+10) BackendBlockSearchTraces/partialMatch-12 601MB ± 1% 307MB ± 4% -48.94% (p=0.000 n=9+9) BackendBlockSearchTraces/service.name-12 3.67MB ± 1% 1.55MB ± 4% -57.76% (p=0.000 n=8+10) name old allocs/op new allocs/op delta BackendBlockSearchTraces/noMatch-12 119k ± 1% 86k ± 6% -27.54% (p=0.000 n=8+10) BackendBlockSearchTraces/partialMatch-12 18.0M ± 0% 17.9M ± 0% -0.15% (p=0.000 n=8+9) BackendBlockSearchTraces/service.name-12 39.2k ± 0% 39.2k ± 0% -0.15% (p=0.000 n=10+10)
This PR avoids unnecessary allocation of page offset buffers when the column is dictionary-encoded. Fixes #372
Wasn't sure if there is an appropriate benchmark to compare within
parquet-go
but here is a comparison using an upstream Grafana Tempo benchmark: