Various performance improvements in the `insertRows` path

sfc-gh-psaha commented 1 month ago

Here are the changes in order of high impact to low:

Use pre-computed powers of 10 instead of computing it fresh for every insertRows call.
Periodically compute free memory in the system instead of on-demand.
Use a for-loop instead of stream and collect in verifyInputColumns
Cache numAvailableProcessors in FlushService.
Cache parameters used in the hot path in insertRows.

I wrote a simple JMH benchmark to measure these changes - see InsertRowsBenchmarkTest. This benchmark inserts 1M rows in a tight loop and each row has a single sb16 column. Overall, 35-40% improvement in this benchmark observed. The benchmark results should carry over to real world workloads that have sb16 columns and the improvement in the memory tracking will transfer over to all workloads.

Before:

# Warmup Iteration   1: 1822464.171 us/op
# Warmup Iteration   2: 1835477.606 us/op
Iteration   1: 1592362.119 us/op
Iteration   2: 1675821.530 us/op
Iteration   3: 1698952.582 us/op
Iteration   4: 1728100.495 us/op
Iteration   5: 1527890.615 us/op
Iteration   6: 1530390.389 us/op
Iteration   7: 2039346.448 us/op
Iteration   8: 1846764.755 us/op
Iteration   9: 1514480.481 us/op
Iteration  10: 1510466.306 us/op

Result "net.snowflake.ingest.streaming.internal.InsertRowsBenchmarkTest.testInsertRow":
  1666457.572 ±(99.9%) 260468.722 us/op [Average]
  (min, avg, max) = (1510466.306, 1666457.572, 2039346.448), stdev = 172283.933
  CI (99.9%): [1405988.850, 1926926.294] (assumes normal distribution)

# Run complete. Total time: 00:01:07

Benchmark                              (numRows)  Mode  Cnt        Score        Error  Units
InsertRowsBenchmarkTest.testInsertRow    1000000  avgt   10  1666457.572 ± 260468.722  us/op

After:

# Warmup Iteration   1: 1139006.469 us/op
# Warmup Iteration   2: 1132933.357 us/op
Iteration   1: 987257.653 us/op
Iteration   2: 1070616.846 us/op
Iteration   3: 982547.952 us/op
Iteration   4: 1225079.558 us/op
Iteration   5: 1198002.424 us/op
Iteration   6: 976386.714 us/op
Iteration   7: 964823.936 us/op
Iteration   8: 1141437.334 us/op
Iteration   9: 1292684.431 us/op
Iteration  10: 940295.781 us/op

Result "net.snowflake.ingest.streaming.internal.InsertRowsBenchmarkTest.testInsertRow":
  1077913.263 ±(99.9%) 192324.330 us/op [Average]
  (min, avg, max) = (940295.781, 1077913.263, 1292684.431), stdev = 127210.636
  CI (99.9%): [885588.933, 1270237.593] (assumes normal distribution)

# Run complete. Total time: 00:00:51

Benchmark                              (numRows)  Mode  Cnt        Score        Error  Units
InsertRowsBenchmarkTest.testInsertRow    1000000  avgt   10  1077913.263 ± 192324.330  us/op

sfc-gh-tzhang commented 4 weeks ago

Overall, 35-40% improvement in the benchmark observed.

Wow, this is huge! Could you add in the describe what types of benchmark did you do? Is it targeted benchmarks for these cases or we're seeing this across all use cases?

sfc-gh-psaha commented 4 weeks ago

Overall, 35-40% improvement in the benchmark observed.

Wow, this is huge! Could you add in the describe what types of benchmark did you do? Is it targeted benchmarks for these cases or we're seeing this across all use cases?

I have the benchmark in this PR and now updated the PR description to describe the benchmark and where the impact will be seen mostly. I think there are probably other areas to improve in the data validation and also in the ParquetRowBuffer code but that's another fight for another day :)

snowflakedb / snowflake-ingest-java

Various performance improvements in the `insertRows` path #782