vmware / splinterdb

High Performance Embedded Key-Value Store
https://splinterdb.org
Apache License 2.0
673 stars 56 forks source link

(#604) Cache page-size in local variables to reduce multiple lookups. #607

Closed gapisback closed 6 months ago

gapisback commented 7 months ago

In debug-build test runs, profiling shows interfaces like clockcache_page_size(), clockcache_config_page_size() bubbling up to top of 'perf top' output. This commit attempts to reduce multiple calls to lookup functions to retrieve the page-size, by caching the page-size once per function where it's used multiple times.

The affected interfaces are: btree_page_size(), clockcache_page_size(), cache_page_size(), cache_config_page_size(), trunk_page_size() and few similar ones.

These changes add up to saving few seconds of test-execution (out of few mins of run-time) in debug-build mode, esp for BTree-related tests.


Empirical performance test results: Running slow tests using test.sh on Nimbus-VM:

With /main:

cc-sdb-vm:[7] $ tail -100 /home/agurajada/tmp/test.nightly.dbg.main.out | grep BTree
BTree test, key size=8 bytes                                                         :  710s [  0h 11m 50s ]
BTree test, with default key size                                                    :  607s [  0h 10m  7s ]
BTree test, key size=100 bytes                                                       :  336s [  0h  5m 36s ]
BTree Perf test                                                                      :  198s [  0h  3m 18s ]
BTree test, key size=8 bytes, using shared memory                                    :  702s [  0h 11m 42s ]
BTree test, with default key size, using shared memory                               :  590s [  0h  9m 50s ]
BTree test, key size=100 bytes, using shared memory                                  :  319s [  0h  5m 19s ]
BTree Perf test, using shared memory                                                 :  192s [  0h  3m 12s ]

All Tests                                                                            : 6905s [  1h 55m  5s ]

With this fix:

cc-sdb-vm:[8] $ tail -100 /home/agurajada/tmp/test.clockcache.dbg.fix.3.out | grep BTree
BTree test, key size=8 bytes                                                         :  698s [  0h 11m 38s ]
BTree test, with default key size                                                    :  553s [  0h  9m 13s ]
BTree test, key size=100 bytes                                                       :  330s [  0h  5m 30s ]
BTree Perf test                                                                      :  197s [  0h  3m 17s ]
BTree test, key size=8 bytes, using shared memory                                    :  640s [  0h 10m 40s ]
BTree test, with default key size, using shared memory                               :  487s [  0h  8m  7s ]
BTree test, key size=100 bytes, using shared memory                                  :  291s [  0h  4m 51s ]
BTree Perf test, using shared memory                                                 :  185s [  0h  3m  5s ]

All Tests                                                                            : 6537s [  1h 48m 57s ]

There is no earth-shattering gains to be had, but we knock-off about several 10s of seconds of execution times in a test-run taking about 10 mins (for example).

Overall test-execution run-time drops by about 6-7 mins, gain of about 5% overall.

netlify[bot] commented 7 months ago

Deploy Preview for splinterdb canceled.

Name Link
Latest commit 0f557fafe0aa4a01688accbf69d0c0cb4890e781
Latest deploy log https://app.netlify.com/sites/splinterdb/deploys/65b043c6093a94000862fc4e
gapisback commented 7 months ago

@rtjohnso - The CI-jobs have completed. There are small improvements in run-times of BTree-related tests with these changes, as shown below. We have other CI-jobs run for PR #606 without this fix:

  1. Debug-GCC: Without fix compared to with-this-fix
  2. Debug-clang: Without fix compared to with-this-fix

The 'gains' are similar to what were seen with manual execution of test.sh on Nimbus-VMs.