Discovered some good performance gains during query time of Parquet files that if we read in all the Parquet values into the buffer at once instead of page by page it decreases query times.
thor@thors-MacBook-Pro ~/.../github.com/polarsignals/frostdb % benchstat before.txt after.txt
name old time/op new time/op delta
_ParquetAggregation/paruqet_aggregation-10 16.0ms ± 0% 9.3ms ± 1% -41.80% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
_ParquetAggregation/paruqet_aggregation-10 33.0MB ± 0% 37.9MB ± 0% +14.99% (p=0.000 n=9+10)
name old allocs/op new allocs/op delta
_ParquetAggregation/paruqet_aggregation-10 1.42k ± 0% 1.40k ± 0% -1.88% (p=0.000 n=9+10)
Slightly more data allocated (due to the larger buffer) but I think it's a worthwhile trade off
Ooof I found that there was a bug in this change where if we had more than two pages the offset i was incorrect and we started to overwrite pages. The different when that's fixed is minimal. Going to close
Discovered some good performance gains during query time of Parquet files that if we read in all the Parquet values into the buffer at once instead of page by page it decreases query times.
Slightly more data allocated (due to the larger buffer) but I think it's a worthwhile trade off