Closed sfc-gh-psaha closed 4 weeks ago
Overall, 35-40% improvement in the benchmark observed.
Wow, this is huge! Could you add in the describe what types of benchmark did you do? Is it targeted benchmarks for these cases or we're seeing this across all use cases?
Overall, 35-40% improvement in the benchmark observed.
Wow, this is huge! Could you add in the describe what types of benchmark did you do? Is it targeted benchmarks for these cases or we're seeing this across all use cases?
I have the benchmark in this PR and now updated the PR description to describe the benchmark and where the impact will be seen mostly. I think there are probably other areas to improve in the data validation and also in the ParquetRowBuffer code but that's another fight for another day :)
Here are the changes in order of high impact to low:
verifyInputColumns
insertRows
.I wrote a simple JMH benchmark to measure these changes - see
InsertRowsBenchmarkTest
. This benchmark inserts 1M rows in a tight loop and each row has a single sb16 column. Overall, 35-40% improvement in this benchmark observed. The benchmark results should carry over to real world workloads that have sb16 columns and the improvement in the memory tracking will transfer over to all workloads.Before:
After: