yahoo / mdbm

MDBM a very fast memory-mapped key/value store.
BSD 3-Clause "New" or "Revised" License
992 stars 111 forks source link

mdbm_check core dumps with a fragment of mdbm #58

Closed skaji closed 8 years ago

skaji commented 8 years ago

If I pass a fragment of mdbm to mdbm_check, then it core dumps.

How to reproduce the issue:

#0. set LD_LIBRARY_PATH
$ export LD_LIBRARY_PATH=$PWD/src/lib/object

#1. create test.mdbm
$ src/tools/object/mdbm_create test.mdbm

#2. save its first 1000bytes to head.mdbm
$ head -c 1000 test.mdbm  > head.mdbm

#3. execute mdbm_check with head.mdbm
$ src/tools/object/mdbm_check head.mdbm
zsh: bus error (core dumped)  src/tools/object/mdbm_check head.mdbm

gdb

$ gdb src/tools/object/mdbm_check core
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from src/tools/object/mdbm_check...done.
[New LWP 7790]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `src/tools/object/mdbm_check head.mdbm'.
Program terminated with signal SIGBUS, Bus error.
#0  check_db_chunks (verbose=1, db=0x19f4060) at mdbm.c:758
758         if (MDBM_PAGE_PTR(db,pg+page->p_num_pages)->p_prev_num_pages != page->p_num_pages) {
Traceback (most recent call last):
  File "/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19-gdb.py", line 63, in <module>
    from libstdcxx.v6.printers import register_libstdcxx_printers
ImportError: No module named 'libstdcxx'
(gdb) set pagination off
(gdb) bt full
#0  check_db_chunks (verbose=1, db=0x19f4060) at mdbm.c:758
        pg = 0
        cur_free = 0
        nerr = 0
        h = 0x7fcfd76ca010
        nchunks = 1
#1  check_db (db=db@entry=0x19f4060, level=level@entry=3, maxlevel=maxlevel@entry=10, verbose=verbose@entry=1) at mdbm.c:1212
        nerr = 0
#2  0x00007fcfd706e83c in mdbm_check (db=db@entry=0x19f4060, level=level@entry=3, verbose=verbose@entry=1) at mdbm.c:7353
No locals.
#3  0x000000000040114b in main (argc=<optimized out>, argv=<optimized out>) at mdbm_check.c:191
        db = 0x19f4060
        ret = 0
        opt = <optimized out>
        opt_version = <optimized out>
        opt_nonversion = 0
        opt_verify_version = 0
        pno = -1
        fn = 0x7ffdc5e3aa9f "head.mdbm"
        oflags = 131072
        verbose = 1
        dbcheck = 3
        lock = 1
        winsize = 0

I know head.mdbm is invalid, but I think mdbm_check should not core dump.

timrc-git commented 8 years ago

Hi Shoichi-

You're right, it shouldn't core dump. And this has been discussed a bit in the past.

However, there is a trade-off between speed/complexity/maintainability and safety. MDBM was designed with raw speed in mind, so it uses lots of raw offsets and pointers. Fixing the particular one you're hitting here is trivial, but there will be lots of others if you truncate at a different place, or mess with the DB in other ways.

If we add lots of boundary checking, then we slow down lots of MDBM operations. Because this kind of corruption occurs very rarely in practice (assuming folks are careful with the code that uses MDBM, e.g.: proper locking, unit-tests, and valgrind), then this extra checking doesn't usually help much. It's primarily only an issue when machines shutdown uncleanly (power failure, kernel panic, etc.).

Another alternative is to make custom versions of lots of MDBM internal functions, just for mdbm_check, which perform the extra checking. However, we don't really want to keep 2 versions of many functions (and macros), as they tend to drift and cause long term maintenance problems. So, this probably involves conditional compilation, and maybe a "debug" version of the library. A debug version could be useful for other things.

Because this is a fairly extensive task with limited gains, someone else would have to champion it. If you're interested, let me know, and we can figure out the best approach.

Thanks! .timrc