pmem / libpmemobj-cpp

C++ bindings & containers for libpmemobj
https://pmem.io
Other
107 stars 76 forks source link

test fail: concurrent_hash_map_rehash_0_helgrind #469

Open kilobyte opened 4 years ago

kilobyte commented 4 years ago

Debian unstable: gcc-9, valgrind 3.15

Full log from two official buildds: x86-ubc-01, x86-csail-02.

 48/957 Test #163: concurrent_hash_map_rehash_0_helgrind ..................................................................................***Failed    5.29 sec
-- Executing: valgrind --error-exitcode=99 --tool=helgrind /<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/tests/concurrent_hash_map_rehash /<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/test/concurrent_hash_map_rehash_0_helgrind/testfile
-- Test concurrent_hash_map_rehash_0_helgrind:
-- Stdout:

-- Stderr:
==10084== Helgrind, a thread error detector
==10084== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==10084== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==10084== Command: /<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/tests/concurrent_hash_map_rehash /<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/test/concurrent_hash_map_rehash_0_helgrind/testfile
==10084== 
==10084== ---Thread-Announcement------------------------------------------
==10084== 
==10084== Thread #14 was created
==10084==    at 0x4BAF2DE: clone (clone.S:71)
==10084==    by 0x489EE0E: create_thread (createthread.c:101)
==10084==    by 0x48A0841: pthread_create@@GLIBC_2.2.5 (pthread_create.c:826)
==10084==    by 0x483C6B7: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
==10084==    by 0x49A9DC4: std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==10084==    by 0x111B2E: thread<(anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (thread:130)
==10084==    by 0x111B2E: construct<std::thread, (anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (new_allocator.h:147)
==10084==    by 0x111B2E: construct<std::thread, (anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (alloc_traits.h:484)
==10084==    by 0x111B2E: _M_realloc_insert<(anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (vector.tcc:449)
==10084==    by 0x111B2E: emplace_back<(anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (vector.tcc:121)
==10084==    by 0x111B2E: insert_erase_lookup_test (concurrent_hash_map_rehash.cpp:106)
==10084==    by 0x111B2E: main (concurrent_hash_map_rehash.cpp:156)
==10084== 
==10084== ---Thread-Announcement------------------------------------------
==10084== 
==10084== Thread #3 was created
==10084==    at 0x4BAF2DE: clone (clone.S:71)
==10084==    by 0x489EE0E: create_thread (createthread.c:101)
==10084==    by 0x48A0841: pthread_create@@GLIBC_2.2.5 (pthread_create.c:826)
==10084==    by 0x483C6B7: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
==10084==    by 0x49A9DC4: std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==10084==    by 0x111745: thread<(anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (thread:130)
==10084==    by 0x111745: construct<std::thread, (anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (new_allocator.h:147)
==10084==    by 0x111745: construct<std::thread, (anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (alloc_traits.h:484)
==10084==    by 0x111745: emplace_back<(anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (vector.tcc:115)
==10084==    by 0x111745: insert_erase_lookup_test (concurrent_hash_map_rehash.cpp:87)
==10084==    by 0x111745: main (concurrent_hash_map_rehash.cpp:156)
==10084== 
==10084== ----------------------------------------------------------------
==10084== 
==10084==  Lock at 0x5D84C60 was first observed
==10084==    at 0x483B213: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
==10084==    by 0x487AD83: ??? (in /usr/lib/x86_64-linux-gnu/libpmemobj.so.1.0.0)
==10084==    by 0x487B860: pmemobj_rwlock_trywrlock (in /usr/lib/x86_64-linux-gnu/libpmemobj.so.1.0.0)
==10084==    by 0x11BE98: try_lock (shared_mutex.hpp:151)
==10084==    by 0x11BE98: try_acquire (concurrent_hash_map.hpp:153)
==10084==    by 0x11BE98: try_acquire_item (concurrent_hash_map.hpp:2634)
==10084==    by 0x11BE98: bool pmem::obj::concurrent_hash_map<pmem::obj::p<int>, pmem::obj::p<int>, std::hash<pmem::obj::p<int> >, std::equal_to<pmem::obj::p<int> >, pmem::obj::shared_mutex, pmem::obj::concurrent_hash_map_internal::shared_mutex_scoped_lock<pmem::obj::shared_mutex> >::internal_find<pmem::obj::p<int> >(pmem::obj::p<int> const&, pmem::obj::concurrent_hash_map<pmem::obj::p<int>, pmem::obj::p<int>, std::hash<pmem::obj::p<int> >, std::equal_to<pmem::obj::p<int> >, pmem::obj::shared_mutex, pmem::obj::concurrent_hash_map_internal::shared_mutex_scoped_lock<pmem::obj::shared_mutex> >::const_accessor*, bool) (concurrent_hash_map.hpp:2687)
==10084==    by 0x112B89: find (concurrent_hash_map.hpp:2332)
==10084==    by 0x112B89: operator() (concurrent_hash_map_rehash.cpp:110)
==10084==    by 0x112B89: __invoke_impl<void, (anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (invoke.h:60)
==10084==    by 0x112B89: __invoke<(anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (invoke.h:95)
==10084==    by 0x112B89: _M_invoke<0> (thread:244)
==10084==    by 0x112B89: operator() (thread:251)
==10084==    by 0x112B89: std::thread::_State_impl<std::thread::_Invoker<std::tuple<(anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::{lambda()#3}> > >::_M_run() (thread:195)
==10084==    by 0x49A9B4F: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==10084==    by 0x483C8B6: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
==10084==    by 0x489FFB6: start_thread (pthread_create.c:486)
==10084==    by 0x4BAF2EE: clone (clone.S:95)
==10084==  Address 0x5d84c60 is in a rw- mapped file /<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/test/concurrent_hash_map_rehash_0_helgrind/testfile segment
==10084== 
==10084==  Lock at 0x5C447A8 was first observed
==10084==    at 0x483B213: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
==10084==    by 0x487AD83: ??? (in /usr/lib/x86_64-linux-gnu/libpmemobj.so.1.0.0)
==10084==    by 0x487B860: pmemobj_rwlock_trywrlock (in /usr/lib/x86_64-linux-gnu/libpmemobj.so.1.0.0)
==10084==    by 0x114421: try_lock (shared_mutex.hpp:151)
==10084==    by 0x114421: try_acquire (concurrent_hash_map.hpp:153)
==10084==    by 0x114421: acquire (concurrent_hash_map.hpp:1646)
==10084==    by 0x114421: bucket_accessor (concurrent_hash_map.hpp:1631)
==10084==    by 0x114421: internal_insert<pmem::obj::p<int>, std::pair<const pmem::obj::p<int>, pmem::obj::p<int> > > (concurrent_hash_map.hpp:2736)
==10084==    by 0x114421: insert (concurrent_hash_map.hpp:2492)
==10084==    by 0x114421: operator() (concurrent_hash_map_rehash.cpp:90)
==10084==    by 0x114421: __invoke_impl<void, (anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (invoke.h:60)
==10084==    by 0x114421: __invoke<(anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (invoke.h:95)
==10084==    by 0x114421: _M_invoke<0> (thread:244)
==10084==    by 0x114421: operator() (thread:251)
==10084==    by 0x114421: std::thread::_State_impl<std::thread::_Invoker<std::tuple<(anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::{lambda()#1}> > >::_M_run() (thread:195)
==10084==    by 0x49A9B4F: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==10084==    by 0x483C8B6: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
==10084==    by 0x489FFB6: start_thread (pthread_create.c:486)
==10084==    by 0x4BAF2EE: clone (clone.S:95)
==10084==  Address 0x5c447a8 is in a rw- mapped file /<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/test/concurrent_hash_map_rehash_0_helgrind/testfile segment
==10084== 
==10084== Possible data race during write of size 4 at 0x5D84C9C by thread #14
==10084== Locks held: 1, at address 0x5D84C60
==10084==    at 0x112E40: operator() (concurrent_hash_map_rehash.cpp:115)
==10084==    by 0x112E40: __invoke_impl<void, (anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (invoke.h:60)
==10084==    by 0x112E40: __invoke<(anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (invoke.h:95)
==10084==    by 0x112E40: _M_invoke<0> (thread:244)
==10084==    by 0x112E40: operator() (thread:251)
==10084==    by 0x112E40: std::thread::_State_impl<std::thread::_Invoker<std::tuple<(anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::{lambda()#3}> > >::_M_run() (thread:195)
==10084==    by 0x49A9B4F: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==10084==    by 0x483C8B6: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
==10084==    by 0x489FFB6: start_thread (pthread_create.c:486)
==10084==    by 0x4BAF2EE: clone (clone.S:95)
==10084== 
==10084== This conflicts with a previous write of size 8 by thread #3
==10084== Locks held: 1, at address 0x5C447A8
==10084==    at 0x11C9EB: hash_map_node (concurrent_hash_map.hpp:345)
==10084==    by 0x11C9EB: create<pmem::obj::concurrent_hash_map_internal::hash_map_node<pmem::obj::p<int>, pmem::obj::p<int>, pmem::obj::shared_mutex, pmem::obj::concurrent_hash_map_internal::shared_mutex_scoped_lock<pmem::obj::shared_mutex> >, pmem::detail::persistent_pool_ptr<pmem::obj::concurrent_hash_map_internal::hash_map_node<pmem::obj::p<int>, pmem::obj::p<int>, pmem::obj::shared_mutex, pmem::obj::concurrent_hash_map_internal::shared_mutex_scoped_lock<pmem::obj::shared_mutex> > >&, std::pair<const pmem::obj::p<int>, pmem::obj::p<int> > > (life.hpp:109)
==10084==    by 0x11C9EB: pmem::detail::pp_if_not_array<pmem::obj::concurrent_hash_map_internal::hash_map_node<pmem::obj::p<int>, pmem::obj::p<int>, pmem::obj::shared_mutex, pmem::obj::concurrent_hash_map_internal::shared_mutex_scoped_lock<pmem::obj::shared_mutex> > >::type pmem::obj::make_persistent<pmem::obj::concurrent_hash_map_internal::hash_map_node<pmem::obj::p<int>, pmem::obj::p<int>, pmem::obj::shared_mutex, pmem::obj::concurrent_hash_map_internal::shared_mutex_scoped_lock<pmem::obj::shared_mutex> >, pmem::detail::persistent_pool_ptr<pmem::obj::concurrent_hash_map_internal::hash_map_node<pmem::obj::p<int>, pmem::obj::p<int>, pmem::obj::shared_mutex, pmem::obj::concurrent_hash_map_internal::shared_mutex_scoped_lock<pmem::obj::shared_mutex> > >&, std::pair<pmem::obj::p<int> const, pmem::obj::p<int> > >(pmem::obj::allocation_flag, pmem::detail::persistent_pool_ptr<pmem::obj::concurrent_hash_map_internal::hash_map_node<pmem::obj::p<int>, pmem::obj::p<int>, pmem::obj::shared_mutex, pmem::obj::concurrent_hash_map_internal::shared_mutex_scoped_lock<pmem::obj::shared_mutex> > >&, std::pair<pmem::obj::p<int> const, pmem::obj::p<int> >&&) (make_persistent.hpp:99)
==10084==    by 0x11CC61: make_persistent<pmem::obj::concurrent_hash_map_internal::hash_map_node<pmem::obj::p<int>, pmem::obj::p<int>, pmem::obj::shared_mutex, pmem::obj::concurrent_hash_map_internal::shared_mutex_scoped_lock<pmem::obj::shared_mutex> >, pmem::detail::persistent_pool_ptr<pmem::obj::concurrent_hash_map_internal::hash_map_node<pmem::obj::p<int>, pmem::obj::p<int>, pmem::obj::shared_mutex, pmem::obj::concurrent_hash_map_internal::shared_mutex_scoped_lock<pmem::obj::shared_mutex> > >&, std::pair<const pmem::obj::p<int>, pmem::obj::p<int> > > (make_persistent.hpp:126)
==10084==    by 0x11CC61: operator() (concurrent_hash_map.hpp:1177)
==10084==    by 0x11CC61: std::_Function_handler<void (), unsigned long pmem::obj::concurrent_hash_map_internal::hash_map_base<pmem::obj::p<int>, pmem::obj::p<int>, pmem::obj::shared_mutex, pmem::obj::concurrent_hash_map_internal::shared_mutex_scoped_lock<pmem::obj::shared_mutex> >::insert_new_node<pmem::obj::concurrent_hash_map_internal::hash_map_node<pmem::obj::p<int>, pmem::obj::p<int>, pmem::obj::shared_mutex, pmem::obj::concurrent_hash_map_internal::shared_mutex_scoped_lock<pmem::obj::shared_mutex> >, std::pair<pmem::obj::p<int> const, pmem::obj::p<int> > >(pmem::obj::concurrent_hash_map_internal::hash_map_base<pmem::obj::p<int>, pmem::obj::p<int>, pmem::obj::shared_mutex, pmem::obj::concurrent_hash_map_internal::shared_mutex_scoped_lock<pmem::obj::shared_mutex> >::bucket*, pmem::detail::persistent_pool_ptr<pmem::obj::concurrent_hash_map_internal::hash_map_node<pmem::obj::p<int>, pmem::obj::p<int>, pmem::obj::shared_mutex, pmem::obj::concurrent_hash_map_internal::shared_mutex_scoped_lock<pmem::obj::shared_mutex> > >&, std::pair<pmem::obj::p<int> const, pmem::obj::p<int> >&&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (std_function.h:300)
==10084==    by 0x116E9E: operator() (std_function.h:690)
==10084==    by 0x116E9E: void pmem::obj::transaction::run<>(pmem::obj::pool_base&, std::function<void ()>) (transaction.hpp:422)
==10084==    by 0x114151: insert_new_node<pmem::obj::concurrent_hash_map_internal::hash_map_node<pmem::obj::p<int>, pmem::obj::p<int>, pmem::obj::shared_mutex, pmem::obj::concurrent_hash_map_internal::shared_mutex_scoped_lock<pmem::obj::shared_mutex> >, std::pair<const pmem::obj::p<int>, pmem::obj::p<int> > > (concurrent_hash_map.hpp:1176)
==10084==    by 0x114151: internal_insert<pmem::obj::p<int>, std::pair<const pmem::obj::p<int>, pmem::obj::p<int> > > (concurrent_hash_map.hpp:2749)
==10084==    by 0x114151: insert (concurrent_hash_map.hpp:2492)
==10084==    by 0x114151: operator() (concurrent_hash_map_rehash.cpp:90)
==10084==    by 0x114151: __invoke_impl<void, (anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (invoke.h:60)
==10084==    by 0x114151: __invoke<(anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::<lambda()> > (invoke.h:95)
==10084==    by 0x114151: _M_invoke<0> (thread:244)
==10084==    by 0x114151: operator() (thread:251)
==10084==    by 0x114151: std::thread::_State_impl<std::thread::_Invoker<std::tuple<(anonymous namespace)::insert_erase_lookup_test(pmem::obj::pool<(anonymous namespace)::root>&)::{lambda()#1}> > >::_M_run() (thread:195)
==10084==    by 0x49A9B4F: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==10084==    by 0x483C8B6: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
==10084==    by 0x489FFB6: start_thread (pthread_create.c:486)
==10084==    by 0x4BAF2EE: clone (clone.S:95)
==10084==  Address 0x5d84c9c is in a rw- mapped file /<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/test/concurrent_hash_map_rehash_0_helgrind/testfile segment
==10084== 
==10084== 
==10084== Use --history-level=approx or =none to gain increased speed, at
==10084== the cost of reduced accuracy of conflicting-access information
==10084== For lists of detected and suppressed errors, rerun with: -s
==10084== ERROR SUMMARY: 368 errors from 1 contexts (suppressed: 41960 from 69)

CMake Error at /<<PKGBUILDDIR>>/tests/helpers.cmake:219 (message):
  valgrind;--error-exitcode=99;--tool=helgrind
  /<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/tests/concurrent_hash_map_rehash
  /<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/test/concurrent_hash_map_rehash_0_helgrind/testfile
  failed: 99
Call Stack (most recent call first):
  /<<PKGBUILDDIR>>/tests/helpers.cmake:272 (execute_common)
  /<<PKGBUILDDIR>>/tests/run_default.cmake:36 (execute)

On my home box (a 64-thread Pinnacle Ridge) I can't seem to reproduce despite multiple runs with exact same toolchain versions; on work desktop (8-thread Skylake) it semi-reliably fails in either concurrent_hash_map_insert_lookup_0_helgrind or concurrent_hash_map_rehash_0_helgrind). I'll leave it in a loop overnight.

igchor commented 4 years ago

This test can only be run on pmem. There is a bug in our test framework that we do not skip this on non-pmem. This is probably the issue.

kilobyte commented 4 years ago

cmake is invoked with -DTESTS_USE_FORCED_PMEM=1 — that should be enough, right?

Overnight stats: Pinnacle Ridge 3 ✗ 149 ✓, Skylake 17 ✗ 108 ✓ (the cluster of "semi-reliable" fails at the start was a fluke). The Skylake is much faster in single-threaded tests (and helgrind traces multiple guest threads in a single host thread).

vinser52 commented 4 years ago

It looks like false-positive. Thread 3 creates a new node while thread 4 changing the value of this node. But thread 4 acquire accessor to the node under the bucket lock. So it could happen only when the thread 3 creates the new node and releases the bucket lock.

igchor commented 4 years ago

It seems to work after: https://github.com/pmem/libpmemobj-cpp/pull/476 so I'll close it for now.

igchor commented 4 years ago

This seems to be reproducing more often now. I think we should think about some annotations for helgrind maybe?

Here's one more log: log_filt.txt

igchor commented 4 years ago

Let's disable it for now: https://github.com/pmem/libpmemobj-cpp/pull/625

After the relase we should look into our test cases and possibly add this test under drd instead of helgrind (also check if all concurrent tests are run under dd/helgrind)