vmware / splinterdb

High Performance Embedded Key-Value Store
https://splinterdb.org
Apache License 2.0
684 stars 57 forks source link

Arithmetic error in computation of page-addr in routing_filter_prefetch() causes debug-assert, or can lead to unending looping hang. #561

Closed gapisback closed 1 year ago

gapisback commented 1 year ago

This issue was discovered while developing a test-case to validate the fix for issue #458 that has already been integrated to /main.

If you run this newly (to-be-added) test case unit/splinterdb_stress_test test_issue_458_mini_destroy_unused_debug_assert, it will run into the following assertion. The test case is a single-threaded inserting 100M small key/value pairs.

Inserted 10 million KV-pairs, this batch: 6 s, 166666 rows/s, cumulative: 58 s, 172413 rows/s ...
Inserted 11 million KV-pairs, this batch: 6 s, 166666 rows/s, cumulative: 64 s, 171875 rows/s ...OS-pid=479158, OS-tid=479158, Thread-ID=0, 
Assertion failed at src/clockcache.c:2092:clockcache_get_internal(): "((addr % page_size) == 0)". addr=1034551297, page_size=4096

With release binary, you will get a hang thru this code-path:

clockcache_get_internal() -> clockcache_get_read() -> clockcache_try_get_read()