vmware / splinterdb

High Performance Embedded Key-Value Store
https://splinterdb.org
Apache License 2.0
680 stars 57 forks source link

Debug assert "filter->addr != 0" trips in trunk_flush_into_bundle() -> trunk_inc_filter(): test_issue_458_mini_destroy_unused_debug_assert test case #570

Open gapisback opened 1 year ago

gapisback commented 1 year ago

The test case splinterdb_stress_test.c:test_issue_458_mini_destroy_unused_debug_assert is currently commented out.

It was added as part of commit SHA f3c92ef6cc6 to fix issue #545 (under PR #561). The test case is a simple workload of a single client loading 100M short k/v pairs.

Repro has been provided as part of this branch: agurajada/570-filter-addr-ne-0-assert

When enabled, that test runs into the following assertion. Repro'ed on/main @ SHA b2245ac:

#2  __GI___pthread_kill (threadid=140737320392256, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7cfb476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7ce17f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7f665ff in platform_assert_false (filename=0x7ffff7fa477b "src/trunk.c", linenumber=3543,
    functionname=0x7ffff7fa8ef0 <__FUNCTION__.66> "trunk_inc_filter", expr=0x7ffff7fa4d95 "filter->addr != 0",
    message=0x7ffff7fa3fca "") at src/platform_linux/platform.c:377
#6  0x00007ffff7f7f6f7 in trunk_inc_filter (spl=0x7fffb5f8c040, filter=0x7fffcf5902d2) at src/trunk.c:3543
#7  0x00007ffff7f821a5 in trunk_flush_into_bundle (spl=0x7fffb5f8c040, parent=0x7ffff5fce7c0,
    child=0x7ffff5fce620, pdata=0x7fffd36fa58c, req=0x7fffa434f3c0) at src/trunk.c:4102
#8  0x00007ffff7f828e9 in trunk_flush (spl=0x7fffb5f8c040, parent=0x7ffff5fce7c0, pdata=0x7fffd36fa58c,
    is_space_rec=0) at src/trunk.c:4214
#9  0x00007ffff7f82e90 in trunk_flush_fullest (spl=0x7fffb5f8c040, node=0x7ffff5fce7c0) at src/trunk.c:4295
#10 0x00007ffff7f83e5b in trunk_compact_bundle (arg=0x5555558f0c80, scratch_buf=0x7ffff5fd2040)
    at src/trunk.c:4642
#11 0x00007ffff7f72950 in task_group_run_task (group=0x555555575980, assigned_task=0x5555558eec40)
    at src/task.c:475
#12 0x00007ffff7f72ac9 in task_worker_thread (arg=0x555555575980) at src/task.c:514
#13 0x00007ffff7f7209d in task_invoke_with_hooks (func_and_args=0x5555555772c0) at src/task.c:221
#14 0x00007ffff7d4db43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#15 0x00007ffff7ddfa00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
#6  0x00007ffff7f7f6f7 in trunk_inc_filter (spl=0x7fffb5f8c040, filter=0x7fffcf5902d2) at src/trunk.c:3543
3543       debug_assert(filter->addr != 0);

A possibly related issue that should be investigated as part of this item is that while the workload is running, we see these messages:

Inserted 52 million KV-pairs, this batch: 5 s, 200000 rows/s, cumulative: 311 s, 167202 rows/s ...
Inserted 53 million KV-pairs, this batch: 6 s, 166666 rows/s, cumulative: 318 s, 166666 rows/s ...btree_pack(): req->num_tuples=6291456 exceeded output size limit, req->max_tuples=6291456
btree_pack failed: No space left on device
btree_pack(): req->num_tuples=6291456 exceeded output size limit, req->max_tuples=6291456
btree_pack failed: No space left on device

Inserted 54 million KV-pairs, this batch: 5 s, 200000 rows/s, cumulative: 323 s, 167182 rows/s ...

These messages also do appear with release binary, but the test case seems to succeed. (Of course, it's a debug assert that is tripping.)


Historical note: This specific assertion has been reported and is mixed-up in the annals of issue #545 (bug in routing_filter_prefetch()). That bug has been fixed separately, so I'm peeling off this different failure to its own issue.

chrisxu333 commented 5 months ago

Hi @gapisback , I also ran into the issue of this message while inserting 20M kv pairs (each 16 bytes large) in a single thread. btree_pack(): req->num_tuples=6291456 exceeded output size limit, req->max_tuples=6291456 btree_pack failed: No space left on device

Any idea how to address this? Thanks :)

P.S. If I increase the size of kv to 128 bytes (8 bytes key and 120 bytes value), this issue could not be reproduced.

gapisback commented 5 months ago

Thanks for reporting this @chrisxu333 -- I'm afraid that I don't have much more to add.

In your failing repro situation: while inserting 20M kv pairs (each 16 bytes large) in a single thread., can you clarify what were the sizes of the key and the value?

There were some set of known instabilities around trunk bundle mgmt, and at some point (~ 12 months ago) these were discussed internally with Splinter dev engineers. I have since moved on from that project and this repo, so am not able to provide any meaningful suggestions.

Cc:'ing @rtjohnso who is the gate-keepeer for this repo now and may have been doing some work to stabilize some of these areas.

chrisxu333 commented 5 months ago

@gapisback Thanks for your kindly reply and explanation. Regarding the key and value size for the failing scenario, I used 8 byte key and 8 byte value.

Moreover, I also opened a new issue about another likely deadlock bug regarding O_DIRECT that I encountered (https://github.com/vmware/splinterdb/issues/620). It would be really helpful if you or whoever is working on this take a look at your convenience :) Thank you!

rtjohnso commented 5 months ago

I believe I've seen the issue with small kv-pairs before. It is due to an estimate of the maximum number of items that might be in a trunk node: https://github.com/vmware/splinterdb/blob/9359c9aa782dd56cfd7f0c2e20e6f92988fe2500/src/trunk.c#L9632 It assumes kv-pairs are at least 32 bytes.

You could try changing the divisor from 32 to 16, or you could just pad out your kv-pairs to 32 bytes.

This is a long-term item to fix due to limitations in other parts of the code.