polarsignals / frostdb

❄️ Coolest database around 🧊 Embeddable column database written in Go.
Apache License 2.0
1.27k stars 65 forks source link

Converter read all pages #878

Closed thorfour closed 2 months ago

thorfour commented 2 months ago

Discovered some good performance gains during query time of Parquet files that if we read in all the Parquet values into the buffer at once instead of page by page it decreases query times.

thor@thors-MacBook-Pro ~/.../github.com/polarsignals/frostdb % benchstat before.txt after.txt
name                                        old time/op    new time/op    delta
_ParquetAggregation/paruqet_aggregation-10    16.0ms ± 0%     9.3ms ± 1%  -41.80%  (p=0.000 n=9+10)

name                                        old alloc/op   new alloc/op   delta
_ParquetAggregation/paruqet_aggregation-10    33.0MB ± 0%    37.9MB ± 0%  +14.99%  (p=0.000 n=9+10)

name                                        old allocs/op  new allocs/op  delta
_ParquetAggregation/paruqet_aggregation-10     1.42k ± 0%     1.40k ± 0%   -1.88%  (p=0.000 n=9+10)

Slightly more data allocated (due to the larger buffer) but I think it's a worthwhile trade off

thorfour commented 2 months ago

Ooof I found that there was a bug in this change where if we had more than two pages the offset i was incorrect and we started to overwrite pages. The different when that's fixed is minimal. Going to close