utsaslab / pebblesdb

The PebblesDB write-optimized key-value store (SOSP 17)
BSD 3-Clause "New" or "Revised" License
504 stars 98 forks source link

VersionSet::RemoveFileLevelBloomFilterInfo isn't thread-safe #19

Open cryptokat opened 5 years ago

cryptokat commented 5 years ago

How to Reproduce Run db_test many times then see DBTest.MultiThreaded crashes occasionally

Stack trace

==== Test DBTest.MultiThreaded
[New Thread 0x7fff4cfd9700 (LWP 23220)]
[New Thread 0x7fff4c7d8700 (LWP 23221)]
[New Thread 0x7fff4bfd7700 (LWP 23222)]
... starting thread 0
[New Thread 0x7fff4b7d6700 (LWP 23223)]
... starting thread 1
[New Thread 0x7fff4afd5700 (LWP 23224)]
... starting thread 2
[New Thread 0x7fff4a7d4700 (LWP 23225)]
... starting thread 3

Thread 307 "db_test" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff4c7d8700 (LWP 23221)]
0x00007ffff7afe106 in std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) where
#0  0x00007ffff7afe106 in std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00005555555d46e3 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*>, std::_Select1st<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> > >::_M_erase_aux(std::_Rb_tree_const_iterator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >) ()
#2  0x00005555555d2919 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*>, std::_Select1st<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> > >::erase[abi:cxx11](std::_Rb_tree_const_iterator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >) ()
#3  0x00005555555d0a64 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*>, std::_Select1st<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> > >::_M_erase_aux(std::_Rb_tree_const_iterator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >, std::_Rb_tree_const_iterator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >) ()
#4  0x00005555555cd5e9 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*>, std::_Select1st<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> > >::erase[abi:cxx11](std::_Rb_tree_const_iterator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >, std::_Rb_tree_const_iterator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >) ()
#5  0x00005555555c8e3c in std::_Rb_tree<unsigned long, std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*>, std::_Select1st<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> > >::erase(unsigned long const&) ()
#6  0x00005555555c6273 in std::map<unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> > >::erase(unsigned long const&) ()
#7  0x00005555555b9358 in leveldb::VersionSet::RemoveFileLevelBloomFilterInfo(unsigned long) ()
#8  0x0000555555591e7f in leveldb::DBImpl::DeleteObsoleteFiles() ()
#9  0x0000555555595600 in leveldb::DBImpl::BackgroundCompactionGuards(leveldb::FileLevelFilterBuilder*) ()
#10 0x0000555555594dc3 in leveldb::DBImpl::CompactLevelThread() ()
#11 0x000055555559e395 in leveldb::DBImpl::CompactLevelWrapper(void*) ()
#12 0x00005555555e7403 in leveldb::(anonymous namespace)::StartThreadWrapper(void*) ()
#13 0x00007ffff7326494 in start_thread (arg=0x7fff4c7d8700) at pthread_create.c:333
#14 0x00007ffff7068acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
bx1n commented 5 years ago

@cryptokat I was trying to reproduce the bug to get a better understanding of the bug. Can you explain how you were able to produce this error?

erangi commented 5 years ago

I've hit this issue too when running a multi-threaded benchmark (using YCSB-C). The std::map isn't thread-safe, and gets corrupted by multiple concurrent write accesses. This is manifested by either this crash or an infinite loop during map traversal. The solution is probably to mutually exclude accesses to file_level_bloom_filter. The lowest overhead way is using a read-write lock (e.g., std::shared_mutex), but that requires C++14/17. Using std::mutex will have a somewhat higher overhead, but at least allow correct concurrent runs.