liuis / leveldb

Automatically exported from code.google.com/p/leveldb
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

empty key assert failure during compaction #97

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
On a clean build of: dd0d562b4d4fbd07db6a44f9e221f8d368fee8e4

$ cat check.cc
#include <set>
#include <map>
#include <string>
#include "leveldb/db.h"
#include "leveldb/write_batch.h"
#include "leveldb/slice.h"
#include <unistd.h>

int main(int args, char **argv) {
  leveldb::Options options;
  leveldb::DB *_db;
  leveldb::Status status = leveldb::DB::Open(options, argv[1], &_db);
  while (1) {
    sleep(1);
  }
}

$ tar -xvf omap.tgz
$ g++ -g2 -pthread -Ileveldb/include check.cc leveldb/libleveldb.a
$ ./a.out omap
a.out: ./db/dbformat.h:96: leveldb::Slice leveldb::ExtractUserKey(const 
leveldb::Slice&): Assertion `internal_key.size() >= 8' failed.
Aborted

$ gdb --args ./a.out omap
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/samuelj/a.out...done.
(gdb) run
Starting program: /home/samuelj/a.out omap
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6fef700 (LWP 1769)]
a.out: ./db/dbformat.h:96: leveldb::Slice leveldb::ExtractUserKey(const 
leveldb::Slice&): Assertion `internal_key.size() >= 8' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff6fef700 (LWP 1769)]
0x00007ffff7320445 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff7320445 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7323bab in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff731910e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007ffff73191b2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x000000000040a5be in leveldb::ExtractUserKey (internal_key=...) at 
./db/dbformat.h:96
#5  0x000000000042f1a3 in leveldb::InternalKeyComparator::Compare 
(this=0x658600, akey=..., bkey=...) at db/dbformat.cc:55
#6  0x00000000004260f2 in leveldb::(anonymous 
namespace)::MergingIterator::FindSmallest (this=0x7ffff0001ad0) at 
table/merger.cc:162
#7  0x0000000000425d18 in leveldb::(anonymous namespace)::MergingIterator::Next 
(this=0x7ffff0001ad0) at table/merger.cc:82
#8  0x0000000000407479 in leveldb::DBImpl::DoCompactionWork (this=0x655390, 
compact=0x7ffff0000bb0) at db/db_impl.cc:951
#9  0x0000000000405f66 in leveldb::DBImpl::BackgroundCompaction (this=0x655390) 
at db/db_impl.cc:681
#10 0x00000000004058b4 in leveldb::DBImpl::BackgroundCall (this=0x655390) at 
db/db_impl.cc:611
#11 0x0000000000405830 in leveldb::DBImpl::BGWork (db=0x655390) at 
db/db_impl.cc:604
#12 0x00000000004358f2 in leveldb::(anonymous namespace)::PosixEnv::BGThread 
(this=0x655030) at util/env_posix.cc:572
#13 0x0000000000435603 in leveldb::(anonymous 
namespace)::PosixEnv::BGThreadWrapper (arg=0x655030) at util/env_posix.cc:512
#14 0x00007ffff76aee9a in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007ffff73dc4bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x0000000000000000 in ?? ()

Original issue reported on code.google.com by rexludo...@gmail.com on 14 Jun 2012 at 9:51

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for the detailed bug report. I looked at the database you enclosed. It 
looks like one of the files in it (000063.sst) is corrupted. It is full of 
zeroes in the offset range [2012770 .. 2015231].

Corruption really shouldn't trigger crashes inside leveldb, and I will think of 
how to fix leveldb so it skips over such corruptions with proper error 
reporting instead of crashing.

Do you have critical data in this database? If so, you can recover most of it 
by rewriting the corrupted file by using the Table and TableBuilder APIs to 
generate a new table file:
  open the file using the Table API
  create TableBuilder pointing to a new file
  create iterator over Table
  for (iter->SeekToFirst(); iter->Valid(); iter->Next()) {
    add iter->key()/iter->value() to builder
  }
  close the builder
  replace corrupted file with new file

Thanks.

Original comment by san...@google.com on 25 Jun 2012 at 8:24

GoogleCodeExporter commented 9 years ago
Ah, thanks for the help!

Original comment by sam.j...@inktank.com on 25 Jun 2012 at 8:26

GoogleCodeExporter commented 9 years ago
Any news or new findings on this issue? I've had problems together with ceph 
and btrfs but it's obviously not because of ceph, rather it seems to have 
something to do with btrfs with compression enabled and leveldb on it. Here's 
the bug reported for the ceph distributed storage system: 
http://tracker.ceph.com/issues/2563

Thanks!

Original comment by j...@insane.se on 9 Feb 2013 at 5:16

GoogleCodeExporter commented 9 years ago
If leveldb is using sparse files, then this may have been the problem 
https://git.kernel.org/?p=linux/kernel/git/josef/btrfs-next.git;a=commit;h=d468a
bec6b9fd7132d012d33573ecb8056c7c43f

Original comment by j.michae...@gmail.com on 9 Feb 2013 at 6:23

GoogleCodeExporter commented 9 years ago
Just as one more data point, we hit this problem too on an ext3 file system 
that had a disk with a few bad sectors (which has since been replaced). Using 
Sanjay's Table/TableBuilder pseudocode allows working around the corrupted 
*.sst with minimal data loss (which is great). Still, a fix in leveldb proper 
would be great appreciated. :)

Original comment by ahochh...@samegoal.com on 19 Feb 2013 at 6:46