vmware / splinterdb

High Performance Embedded Key-Value Store
https://splinterdb.org
Apache License 2.0
680 stars 57 forks source link

Assertion failure routing_filter_prefetch() -> cache_get(), reporting invalid input page addr. #545

Closed gapisback closed 1 year ago

gapisback commented 1 year ago

In /main, [ @SHA 4b6e0b1c ], while trying to re-run & validate repro created for issue #458 (branch agurajada/458-mini_destroy_unused-Assert; new test case large_inserts_stress_test.c), we run into the following assertion using the debug binary:

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737349731584) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737349731584)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737349731584) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737349731584, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7d10476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7cf67f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000055555555b495 in platform_assert_false (filename=0x7ffff7f9e586 "src/clockcache.c",
    linenumber=2092, functionname=0x7ffff7f9f960 <__FUNCTION__.17> "clockcache_get_internal",
    expr=0x7ffff7f9ed83 "((addr % page_size) == 0)",
    message=0x7ffff7f9ed6a "addr=%lu, page_size=%lu\n") at src/platform_linux/platform.c:377
#6  0x00007ffff7f599e9 in clockcache_get_internal (cc=0x5555555622c0, addr=1034551297, blocking=1,
    type=PAGE_TYPE_FILTER, page=0x7fffffffd460) at src/clockcache.c:2092
#7  0x00007ffff7f59f82 in clockcache_get (cc=0x5555555622c0, addr=1034551297, blocking=1,
    type=PAGE_TYPE_FILTER) at src/clockcache.c:2271
#8  0x00007ffff7f562f8 in clockcache_get_virtual (c=0x5555555622c0, addr=1034551297, blocking=1,
    type=PAGE_TYPE_FILTER) at src/clockcache.c:295
#9  0x00007ffff7f68c73 in cache_get (cc=0x5555555622c0, addr=1034551297, blocking=1,
    type=PAGE_TYPE_FILTER) at src/cache.h:283
#10 0x00007ffff7f6b1f3 in routing_filter_prefetch (cc=0x5555555622c0, cfg=0x7fffe6feb0d8,
    filter=0x7fffffffd80c, num_indices=1024) at src/routing_filter.c:646
#11 0x00007ffff7f6b6b3 in routing_filter_estimate_unique_fp (cc=0x5555555622c0, cfg=0x7fffe6feb0d8,
    hid=0x0, filter=0x7fffffffd7f0, num_filters=3) at src/routing_filter.c:715
#12 0x00007ffff7f865a4 in trunk_pivot_estimate_unique_keys (spl=0x7fffe6feb040, node=0x7fffffffe000,
    pdata=0x7ffff5d524ae) at src/trunk.c:5183
#13 0x00007ffff7f86962 in trunk_split_leaf (spl=0x7fffe6feb040, parent=0x7fffffffe140,
    leaf=0x7fffffffe000, child_idx=0) at src/trunk.c:5281
#14 0x00007ffff7f828ca in trunk_flush (spl=0x7fffe6feb040, parent=0x7fffffffe140,
    pdata=0x7fffe70fd4ae, is_space_rec=0) at src/trunk.c:4235
#15 0x00007ffff7f82f0c in trunk_flush_fullest (spl=0x7fffe6feb040, node=0x7fffffffe140)
    at src/trunk.c:4324
#16 0x00007ffff7f7efa4 in trunk_memtable_incorporate (spl=0x7fffe6feb040, generation=12, tid=0)
    at src/trunk.c:3380
#17 0x00007ffff7f7f276 in trunk_memtable_flush_internal (spl=0x7fffe6feb040, generation=12)
    at src/trunk.c:3432
#18 0x00007ffff7f7f2cd in trunk_memtable_flush_internal_virtual (arg=0x7fffe703c2f0,
    scratch=0x7ffff7adb040) at src/trunk.c:3443
#19 0x00007ffff7f727ac in task_group_run_task (group=0x55555556ba80, assigned_task=0x555555d3c580)
    at src/task.c:481

The extended assertion reports:

Running 1 CTests, suite name 'large_inserts_bugs_stress', test case 'all'.
TEST 1/1 large_inserts_bugs_stress:test_issue_458_mini_destroy_unused_debug_assert OS-pid=4159620, OS-tid=4159620, Thread-ID=0, Assertion failed at src/clockcache.c:2092:clockcache_get_internal(): "((addr % page_size) == 0)". addr=1034551297, page_size=4096

The input page-address addr=1034551297 is clearly wrong.

The relevant code in /main is this:

 630 void
 631 routing_filter_prefetch(cache          *cc,
 632                         routing_config *cfg,
 633                         routing_filter *filter,
 634                         uint64          num_indices)
 635 {
 636    uint64 last_extent_addr = 0;
 637    uint64 addrs_per_page =
 638       cache_config_page_size(cfg->cache_cfg) / sizeof(uint64);
 639    uint64 num_index_pages = (num_indices - 1) / addrs_per_page + 1;
 640    uint64 index_no        = 0;
 641
 642    for (uint64 index_page_no = 0; index_page_no < num_index_pages;
 643         index_page_no++) {
 644       uint64       index_addr = filter->addr + index_page_no;

On L644, the addr + index_page_no seems incorrect. We should multiply this index by the page-size.

gapisback commented 1 year ago

NOTE: With an attempted fix for reported issue in the function (), when you re-run the new test case, we run into this new BTree-pack failure:

(gdb) run
Starting program: /home/agurajada/Code/splinterdb/build/debug/bin/unit/large_inserts_stress_test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Running 1 CTests, suite name 'large_inserts_stress', test case 'all'.
TEST 1/1 large_inserts_stress:test_issue_458_mini_destroy_unused_debug_assert btree_pack exceeded output size limit: req->num_tuples=6291456, req->max_tuples=6291456
btree_pack exceeded output size limit: req->num_tuples=6291456, req->max_tuples=6291456

This error is triggered by this code in btree_pack():

2985    while (SUCCESS(iterator_at_end(req->itor, &at_end)) && !at_end) {
2986       iterator_get_curr(req->itor, &tuple_key, &data);
2987       if (!btree_pack_can_fit_tuple(req, tuple_key, data)) {
2988          platform_error_log("%s(): req->num_tuples=%lu exceeded output size "
2989                             "limit: req->max_tuples=%lu\n",
2990                             __func__,
2991                             req->num_tuples,
2992                             req->max_tuples);

Curiously, the checker function btree_pack_can_fit_tuple() does this: return req->num_tuples < req->max_tuples;

Seems like an off-by-1 error in the condition check.

gapisback commented 1 year ago

Attn: @ajhconway , @rtjohnso -- Just getting your attention to this long-winded 'test-stabilization' fix attempts I am meandering through in the background: (We should talk in-person for me to make progress on this item.)


After fixing the original issue "((addr % page_size) == 0)". and with an attempted fix for what seems to be an off-by-1 error in the btree_pack_can_fit_tuple() check [ see previous comment] , if you re-run this single test case, it will run into this error from btree_pack():

TEST 1/1 large_inserts_stress:test_issue_458_mini_destroy_unused_debug_assert btree_pack(): req->num_tuples=6291457 exceeded output size limit: req->max_tuples=6291456
btree_pack(): req->num_tuples=6291457 exceeded output size limit: req->max_tuples=6291456

Debugging this, I confirmed that we only run into this from this code-flow:

Breakpoint 2, btree_pack (req=0x7fffffffbc90) at src/btree.c:2988
2988             platform_error_log("%s(): req->num_tuples=%lu exceeded output size "
(gdb) bt
#0  btree_pack (req=0x7fffffffbc90) at src/btree.c:2988
#1  0x00007ffff7f8479d in trunk_compact_bundle (arg=0x555557531680, scratch_buf=0x7ffff7adb040) at src/trunk.c:4797
#2  0x00007ffff7f727f7 in task_group_run_task (group=0x55555556cd40, assigned_task=0x555555d3c600) at src/task.c:481
#3  0x00007ffff7f730eb in task_group_perform_one (group=0x55555556cd40, queue_scale_percent=0) at src/task.c:712
#4  0x00007ffff7f73172 in task_perform_one_if_needed (ts=0x55555556a580, queue_scale_percent=0) at src/task.c:731
#5  0x00007ffff7f8a384 in trunk_insert (spl=0x7fffe6feb040, tuple_key=..., data=...) at src/trunk.c:6177
#6  0x00007ffff7f70f9c in splinterdb_insert_message (kvs=0x555555561c80, user_key=..., msg=...)
    at src/splinterdb.c:470
#7  0x00007ffff7f71044 in splinterdb_insert (kvsb=0x555555561c80, user_key=..., value=...) at src/splinterdb.c:478
#8  0x000055555555991d in ctest_large_inserts_stress_test_issue_458_mini_destroy_unused_debug_assert_run (
    data=0x555555560240 <ctest_large_inserts_stress_test_issue_458_mini_destroy_unused_debug_assert_data>)
    at tests/unit/large_inserts_stress_test.c:81
#9  0x0000555555557f8f in ctest_main (argc=1, argv=0x7fffffffe758) at tests/unit/main.c:320
#10 0x0000555555557a34 in main (argc=1, argv=0x7fffffffe758) at tests/unit/main.c:159

And it's a valid failure condition:

(gdb) p req->num_tuples
$9 = 6291457
(gdb) p req->max_tuples
$10 = 6291456

There is already code to handle from this failure and cleanup; i.e.

4617 void
4618 trunk_compact_bundle(void *arg, void *scratch_buf)
[...]

4797    platform_status pack_status = btree_pack(&pack_req);
4798    if (!SUCCESS(pack_status)) {
4799       platform_default_log("btree_pack failed: %s\n",
4800                            platform_status_to_string(pack_status));
4801       trunk_compact_bundle_cleanup_iterators(
4802          spl, &merge_itor, num_branches, skip_itor_arr);
4803       btree_pack_req_deinit(&pack_req, spl->heap_id);
4804       platform_free(spl->heap_id, req);
4805       goto out;
4806    }

Seems like on L4799 onwards, we do a clean back-out, so no stale BTree-pack should be residual.

This happens twice in this test case's run. Then we run into this hard assertion:

OS-pid=11463, OS-tid=11463, Thread-ID=0, Assertion failed at src/trunk.c:3543:trunk_inc_filter(): "filter->addr != 0".

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737349731584) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737349731584) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737349731584) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737349731584, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7d10476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7cf67f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000055555555b495 in platform_assert_false (filename=0x7ffff7fa477b "src/trunk.c", linenumber=3543,
    functionname=0x7ffff7fa8e90 <__FUNCTION__.66> "trunk_inc_filter", expr=0x7ffff7fa4d95 "filter->addr != 0",
    message=0x7ffff7fa3fca "") at src/platform_linux/platform.c:377
#6  0x00007ffff7f7f594 in trunk_inc_filter (spl=0x7fffe6feb040, filter=0x7ffff43622d2) at src/trunk.c:3543
#7  0x00007ffff7f82042 in trunk_flush_into_bundle (spl=0x7fffe6feb040, parent=0x7fffffffbc30, child=0x7fffffffbaa0,
    pdata=0x7ffff419458c, req=0x555557531a40) at src/trunk.c:4102
#8  0x00007ffff7f82786 in trunk_flush (spl=0x7fffe6feb040, parent=0x7fffffffbc30, pdata=0x7ffff419458c,
    is_space_rec=0) at src/trunk.c:4214
#9  0x00007ffff7f82d2d in trunk_flush_fullest (spl=0x7fffe6feb040, node=0x7fffffffbc30) at src/trunk.c:4295
#10 0x00007ffff7f83cf8 in trunk_compact_bundle (arg=0x555557531680, scratch_buf=0x7ffff7adb040) at src/trunk.c:4642
#11 0x00007ffff7f727f7 in task_group_run_task (group=0x55555556cd40, assigned_task=0x555555d3c600) at src/task.c:481
#12 0x00007ffff7f730eb in task_group_perform_one (group=0x55555556cd40, queue_scale_percent=0) at src/task.c:712
#13 0x00007ffff7f73172 in task_perform_one_if_needed (ts=0x55555556a580, queue_scale_percent=0) at src/task.c:731
#14 0x00007ffff7f8a384 in trunk_insert (spl=0x7fffe6feb040, tuple_key=..., data=...) at src/trunk.c:6177
#15 0x00007ffff7f70f9c in splinterdb_insert_message (kvs=0x555555561c80, user_key=..., msg=...)
    at src/splinterdb.c:470
#16 0x00007ffff7f71044 in splinterdb_insert (kvsb=0x555555561c80, user_key=..., value=...) at src/splinterdb.c:478
#17 0x000055555555991d in ctest_large_inserts_stress_test_issue_458_mini_destroy_unused_debug_assert_run (
    data=0x555555560240 <ctest_large_inserts_stress_test_issue_458_mini_destroy_unused_debug_assert_data>)
    at tests/unit/large_inserts_stress_test.c:81

The error: trunk_inc_filter(): "filter->addr != 0" is very weird.

This means in this code block:

4000 trunk_bundle *
4001 trunk_flush_into_bundle(trunk_handle             *spl,    // IN
[...]

4095          child_sb->end_branch = trunk_end_branch(spl, child);
4096          for (uint16 i = 0; i < filter_count; i++) {
4097             routing_filter *child_filter =
4098                trunk_subbundle_filter(spl, child, child_sb, i);
4099             routing_filter *parent_filter =
4100                trunk_subbundle_filter(spl, parent, parent_sb, i);
4101             *child_filter = *parent_filter;
4102             trunk_inc_filter(spl, child_filter);
4103          }

On L4102, we are accessing a child_filter page that has not been setup properly.

Under the debugger we find:

(gdb) p child_filter
$12 = (routing_filter *) 0x7ffff43622d2
(gdb) p *child_filter
$13 = {addr = 0, meta_head = 0, num_fingerprints = 0, num_unique = 0, value_size = 0}

Unclear - why this is an uninitialized filter handle.

Or we are getting some wrong data structure in the code above.

Curiously, parent_filter also appears 'bogus':

(gdb) p parent_filter
$14 = (routing_filter *) 0x7ffff419437a
(gdb) p *parent_filter
$15 = {addr = 0, meta_head = 0, num_fingerprints = 0, num_unique = 0, value_size = 0}

This will need further debugging.

gapisback commented 1 year ago

The original issue reported in this issue (bug in routing_filter_prefetch()) has been fixed and integrated.

The sub-issues reported in this thread has been peeled of to its own standalone issue #570.