vmware / splinterdb

High Performance Embedded Key-Value Store
https://splinterdb.org
Apache License 2.0
682 stars 57 forks source link

platform_buffer_init() [ was: platform_buffer_create() ] fails silently #26

Open ajhconway opened 3 years ago

ajhconway commented 3 years ago

When the mmap inside platform_buffer_create fails, the check against MAP_FAILED doesn't trigger.

Checked on Ubuntu 20 LTS in a VM with clang 10 with huge pages enabled but not supported.

Updated: 10.Jan.2023: (agurajada)

Under PR #508, platform_buffer_create() is being renamed to platform_buffer_init().

The repro for this issue is as follows: On a Nimbus-VM (where probably Linux huge-pages is not supported / enabled), if you run this test, you will get a signal, with the stack shown below:

sdb-fdb-build:[121] $ build/release/bin/driver_test splinter_test --set-hugetlb
build/release/bin/driver_test: splinterdb_build_version fbed78bc-dirty
Dispatch test splinter_test
Fingerprint size 29 too large, max value size is 5, setting to 27
filter-index-size: 256 is too small, setting to 512
Bus error (core dumped)

With debug binary (with code from above-mentioned PR), the stack looks like so:

Program received signal SIGBUS, Bus error.
__memset_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:351
351 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
(gdb) bt
#0  __memset_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:351
#1  0x00007ffff7f6d7b5 in rc_allocator_init (al=0x7fffffffe1b8, cfg=0x7fffffffe468, io=0x55555559f200, hid=0x0, mid=0x0) at src/rc_allocator.c:364
#2  0x000055555557685c in splinter_test (argc=2, argv=0x7fffffffe760) at tests/functional/splinter_test.c:2764
#3  0x000055555557c308 in test_dispatcher (argc=3, argv=0x7fffffffe758) at tests/functional/test_dispatcher.c:44
#4  0x00005555555665c2 in main (argc=3, argv=0x7fffffffe758) at tests/functional/driver_test.c:13

The problem seems to be that call to:

 77    bh->addr = mmap(NULL, length, prot, flags, -1, 0);
 78    if (bh->addr == MAP_FAILED) {
 79       platform_error_log(
 80          "mmap (%lu bytes) failed with error: %s\n", length, strerror(errno));
 81       goto error;
 82    }

seems to succeed, but bh->addr is bogus / inaccessible.

On this test Nimbus-VM, huge-pages is not enabled:

$ grep HugePages_ /proc/meminfo
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0

mmap() man pages https://man7.org/linux/man-pages/man2/mmap.2.html has no further information on error return status. On debugging, errno is still 0 after this errant mmap() call.

So programmatically, we don't have enough info to check that the mmap() call has failed. Maybe one option is to check if the system has huge-pages configured & enabled, but I could not find a programatic way of probing from that using a system call. (Need to look into sysctl etc ...)

gapisback commented 1 year ago

Debugging further, the problem is :

Breakpoint 2, rc_allocator_init (al=0x7fffffffe1c8, cfg=0x7fffffffe478, io=0x55555559f200, hid=0x0, mid=0x0) at src/rc_allocator.c:356
356    rc                 = platform_buffer_init(&al->bh, buffer_size);

363    al->ref_count = platform_buffer_getaddr(&al->bh);
(gdb)
364    memset(al->ref_count, 0, buffer_size);

On L363, al->ref_count is set to al->bh->addr, but memset() on L364 runs into a signal as the address mapped is bogus.