njuhugn / leveldb

Automatically exported from code.google.com/p/leveldb
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

LevelDB get stuck in leveldb::DBImpl::MakeRoomForWrite() #163

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
When running Ceph storage cluster, its cluster monitor often get stuck in 
leveldb::DBImpl::MakeRoomForWrite(). The LevelDB library was compiled from 
source, head is commit 514c943a8e (Make DB::Open fail if sst files are 
missing). 

The call stack is:
------
(gdb) bt
#0  0x000000314020b5e5 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f1df79325b3 in leveldb::port::CondVar::Wait (this=<optimized out>) 
at port/port_posix.cc:38
#2  0x00007f1df7909d64 in leveldb::DBImpl::MakeRoomForWrite 
(this=this@entry=0x3288000, force=false) at db/db_impl.cc:1283
#3  0x00007f1df790a187 in leveldb::DBImpl::Write (this=0x3288000, options=..., 
my_batch=0x521e108) at db/db_impl.cc:1151
#4  0x0000000000588269 in LevelDBStore::submit_transaction_sync 
(this=this@entry=0x3280060, t=std::tr1::shared_ptr (count 2, weak 0) 0x521e100) 
at os/LevelDBStore.h:129
#5  0x00000000004a6fda in MonitorDBStore::apply_transaction (this=<optimized 
out>, t=...) at mon/MonitorDBStore.h:192
#6  0x00000000004f64a8 in Paxos::begin (this=this@entry=0x3b00000, v=...) at 
mon/Paxos.cc:509
#7  0x00000000004f6f5b in Paxos::propose_queued (this=this@entry=0x3b00000) at 
mon/Paxos.cc:1270
#8  0x00000000004f72e8 in Paxos::propose_new_value (this=0x3b00000, bl=..., 
onfinished=<optimized out>) at mon/Paxos.cc:1292
#9  0x00000000005006d6 in PaxosService::propose_pending (this=0x32b81e0) at 
mon/PaxosService.cc:180
#10 0x00000000004d571a in Context::complete (this=0x31f11c0, r=<optimized out>) 
at ./include/Context.h:41
#11 0x000000000064534f in SafeTimer::timer_thread (this=0x3932d80) at 
common/Timer.cc:105
#12 0x00000000006469dd in SafeTimerThread::entry (this=<optimized out>) at 
common/Timer.cc:38
#13 0x0000003140207d15 in start_thread () from /lib64/libpthread.so.0
#14 0x000000313faf248d in clone () from /lib64/libc.so.6

The code is:
------
1280    } else if (versions_->NumLevelFiles(0) >= 
config::kL0_StopWritesTrigger) {
1281      // There are too many level-0 files.
1282      Log(options_.info_log, "waiting...\n");
1283      bg_cv_.Wait();

(versions_->NumLevelFiles(0) >= config::kL0_StopWritesTrigger) is no longer 
true when I check it
------
(gdb) f 2
#2  0x00007f1df7909d64 in leveldb::DBImpl::MakeRoomForWrite 
(this=this@entry=0x3288000, force=false) at db/db_impl.cc:1283
1283          bg_cv_.Wait();
(gdb) p versions_->current_->files_[0]
$7 = std::vector of length 8, capacity 8 = {0x3211410, 0x3211440, 0x3211e30, 
0x32126a0, 0x3211bc0, 0x3211740, 0x32117a0, 0x32126d0}

Original issue reported on code.google.com by uKer...@gmail.com on 2 May 2013 at 3:04

GoogleCodeExporter commented 9 years ago
We are also seeing several users hit this:

Thread 30 (Thread 0x7fcebf8a3700 (LWP 6035)):
#0  0x00007fcec3e09d84 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00000000006efffd in leveldb::port::CondVar::Wait() ()
#2  0x00000000006d9a98 in leveldb::DBImpl::TEST_CompactMemTable() ()
#3  0x00000000006d9b3c in leveldb::DBImpl::CompactRange(leveldb::Slice const*, 
leveldb::Slice const*) ()
#4  0x0000000000490f41 in compact_prefix (prefix=..., this=<optimized out>) at 
./os/LevelDBStore.h:49

Original comment by sagew...@gmail.com on 2 May 2013 at 10:56

GoogleCodeExporter commented 9 years ago
i have two backtraces of the hang(s) which both show the background thread is 
not running...

http://pastebin.com/raw.php?i=uii6XxBJ
http://tracker.ceph.com/attachments/download/814/bt.txt

Original comment by sagew...@gmail.com on 2 May 2013 at 11:12

GoogleCodeExporter commented 9 years ago
mystery solved: leveldb is keeping a static pointer for the posix environment 
(bad shared library etiquette!!) and we were using the db both before and after 
a fork().  there was no way to create a new PosixEnv either, so we kludged 
around it by reordering our forking.

Original comment by s...@inktank.com on 3 May 2013 at 7:28