googleapis / google-cloud-cpp

C++ Client Libraries for Google Cloud Services
https://cloud.google.com/
Apache License 2.0
547 stars 369 forks source link

heap-use-after-free in CompletionQueueTest.ShutdownWithReschedulingTimer #4423

Closed devbww closed 4 years ago

devbww commented 4 years ago

Are we holding the grpc::Alarm incorrectly?

cloud-cpp/github/google-cloud-cpp/master/docker/tsan-presubmit failure:

[ RUN      ] CompletionQueueTest.ShutdownWithReschedulingTimer
==================
WARNING: ThreadSanitizer: heap-use-after-free (pid=4743)
  Read of size 8 at 0x7b4400001bc8 by thread T3:
    #0 grpc_core::ExecCtx::Flush() /proc/self/cwd/external/com_github_grpc_grpc/src/core/lib/iomgr/exec_ctx.cc:150 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgrpc_Ubase_Uc.so+0xd847a)
    #1 run_some_timers() /proc/self/cwd/external/com_github_grpc_grpc/src/core/lib/iomgr/timer_manager.cc:134 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgrpc_Ubase_Uc.so+0xf97d2)
    #2 timer_main_loop() /proc/self/cwd/external/com_github_grpc_grpc/src/core/lib/iomgr/timer_manager.cc:237 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgrpc_Ubase_Uc.so+0xf9593)
    #3 timer_thread(void*) /proc/self/cwd/external/com_github_grpc_grpc/src/core/lib/iomgr/timer_manager.cc:284 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgrpc_Ubase_Uc.so+0xf94d7)
    #4 operator() /proc/self/cwd/external/com_github_grpc_grpc/src/core/lib/gprpp/thd_posix.cc:140 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgpr_Ubase.so+0x1095a)
    #5 __invoke /proc/self/cwd/external/com_github_grpc_grpc/src/core/lib/gprpp/thd_posix.cc:110 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgpr_Ubase.so+0x10809)

  Previous write of size 8 at 0x7b4400001bc8 by main thread:
    #0 free ??:? (completion_queue_test+0x4d65d4)
    #1 ~AlarmImpl /proc/self/cwd/external/com_github_grpc_grpc/src/cpp/common/alarm.cc:44 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgrpc++_Ubase.so+0xdd551)
    #2 grpc_impl::internal::AlarmImpl::Unref() /proc/self/cwd/external/com_github_grpc_grpc/src/cpp/common/alarm.cc:114 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgrpc++_Ubase.so+0xdd634)
    #3 grpc_impl::internal::AlarmImpl::Destroy() /proc/self/cwd/external/com_github_grpc_grpc/src/cpp/common/alarm.cc:107 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgrpc++_Ubase.so+0xdd411)
    #4 ~Alarm /proc/self/cwd/external/com_github_grpc_grpc/src/cpp/common/alarm.cc:156 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgrpc++_Ubase.so+0xdcf62)
    #5 ~Alarm /proc/self/cwd/external/com_github_grpc_grpc/src/cpp/common/alarm.cc:154 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgrpc++_Ubase.so+0xdcfa9)
    #6 std::default_delete<grpc_impl::Alarm>::operator()(grpc_impl::Alarm*) const /usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/bits/unique_ptr.h:81 (libgoogle_Scloud_Slibgoogle_Ucloud_Ucpp_Ugrpc_Uutils.so+0x29077)
    #7 ~unique_ptr /usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/bits/unique_ptr.h:292 (libgoogle_Scloud_Slibgoogle_Ucloud_Ucpp_Ugrpc_Uutils.so+0x27ef7)
    #8 ~AsyncTimerFuture /proc/self/cwd/google/cloud/completion_queue.cc:38 (libgoogle_Scloud_Slibgoogle_Ucloud_Ucpp_Ugrpc_Uutils.so+0x27430)
    #9 void __gnu_cxx::new_allocator<google::cloud::v1::(anonymous namespace)::AsyncTimerFuture>::destroy<google::cloud::v1::(anonymous namespace)::AsyncTimerFuture>(google::cloud::v1::(anonymous namespace)::AsyncTimerFuture*) /usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/ext/new_allocator.h:153 (libgoogle_Scloud_Slibgoogle_Ucloud_Ucpp_Ugrpc_Uutils.so+0x27ab1)
    #10 void std::allocator_traits<std::allocator<google::cloud::v1::(anonymous namespace)::AsyncTimerFuture> >::destroy<google::cloud::v1::(anonymous namespace)::AsyncTimerFuture>(std::allocator<google::cloud::v1::(anonymous namespace)::AsyncTimerFuture>&, google::cloud::v1::(anonymous namespace)::AsyncTimerFuture*) /usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/bits/alloc_traits.h:497 (libgoogle_Scloud_Slibgoogle_Ucloud_Ucpp_Ugrpc_Uutils.so+0x27a49)
    #11 std::_Sp_counted_ptr_inplace<google::cloud::v1::(anonymous namespace)::AsyncTimerFuture, std::allocator<google::cloud::v1::(anonymous namespace)::AsyncTimerFuture>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/bits/shared_ptr_base.h:557 (libgoogle_Scloud_Slibgoogle_Ucloud_Ucpp_Ugrpc_Uutils.so+0x271b1)
    #12 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/bits/shared_ptr_base.h:155 (completion_queue_test+0x581390)
    #13 ~__shared_count /usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/bits/shared_ptr_base.h:730 (completion_queue_test+0x581346)
    #14 ~__shared_ptr /usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/bits/shared_ptr_base.h:1169 (completion_queue_test+0x584a7d)
    #15 google::cloud::v1::internal::CompletionQueueImpl::SimulateCompletion(bool) /proc/self/cwd/google/cloud/internal/completion_queue_impl.cc:127 (libgoogle_Scloud_Slibgoogle_Ucloud_Ucpp_Ugrpc_Uutils.so+0x335e7)
    #16 google::cloud::v1::(anonymous namespace)::CompletionQueueTest_RunAsyncCompletionQueueDestroyed_Test::TestBody() /proc/self/cwd/google/cloud/completion_queue_test.cc:342 (completion_queue_test+0x575690)
    #17 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2433 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xe29d7)
    #18 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2469 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xcffa2)
    #19 testing::Test::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2508 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xbf430)
    #20 testing::TestInfo::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2684 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xbfd12)
    #21 testing::TestSuite::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2816 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xc0321)
    #22 testing::internal::UnitTestImpl::RunAllTests() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:5338 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xc9d31)
    #23 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2433 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xe6267)
    #24 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2469 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xd2602)
    #25 testing::UnitTest::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:4925 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xc98d8)
    #26 RUN_ALL_TESTS() /proc/self/cwd/external/com_google_googletest/googletest/include/gtest/gtest.h:2473 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest_Umain.so+0xdfb)
    #27 main /proc/self/cwd/external/com_google_googletest/googlemock/src/gmock_main.cc:63 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest_Umain.so+0xda8)

  Thread T3 'grpc_global_tim' (tid=4790, running) created by main thread at:
    #0 pthread_create ??:? (completion_queue_test+0x4f3fa1)
    #1 ThreadInternalsPosix /proc/self/cwd/external/com_github_grpc_grpc/src/core/lib/gprpp/thd_posix.cc:109 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgpr_Ubase.so+0x10496)
    #2 Thread /proc/self/cwd/external/com_github_grpc_grpc/src/core/lib/gprpp/thd_posix.cc:186 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgpr_Ubase.so+0x10218)
    #3 start_timer_thread_and_unlock() /proc/self/cwd/external/com_github_grpc_grpc/src/core/lib/iomgr/timer_manager.cc:92 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgrpc_Ubase_Uc.so+0xf9417)
    #4 start_threads() /proc/self/cwd/external/com_github_grpc_grpc/src/core/lib/iomgr/timer_manager.cc:293 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgrpc_Ubase_Uc.so+0xf8fc5)
    #5 grpc_timer_manager_init() /proc/self/cwd/external/com_github_grpc_grpc/src/core/lib/iomgr/timer_manager.cc:311 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgrpc_Ubase_Uc.so+0xf8f4d)
    #6 grpc_iomgr_start() /proc/self/cwd/external/com_github_grpc_grpc/src/core/lib/iomgr/iomgr.cc:69 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgrpc_Ubase_Uc.so+0xdd243)
    #7 grpc_init /proc/self/cwd/external/com_github_grpc_grpc/src/core/lib/surface/init.cc:162 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgrpc.so+0x45f8)
    #8 grpc::internal::GrpcLibrary::init() /proc/self/cwd/external/com_github_grpc_grpc/include/grpcpp/impl/grpc_library.h:34 (libexternal_Scom_Ugithub_Ugrpc_Ugrpc_Slibgrpc++.so+0x37fb3)
    #9 GrpcLibraryCodegen /proc/self/cwd/external/com_github_grpc_grpc/include/grpcpp/impl/codegen/grpc_library.h:45 (completion_queue_test+0x583d9c)
    #10 CompletionQueue /proc/self/cwd/external/com_github_grpc_grpc/include/grpcpp/impl/codegen/completion_queue_impl.h:247 (completion_queue_test+0x583c49)
    #11 CompletionQueue /proc/self/cwd/external/com_github_grpc_grpc/include/grpcpp/impl/codegen/completion_queue_impl.h:108 (completion_queue_test+0x583bc7)
    #12 CompletionQueueImpl /proc/self/cwd/./google/cloud/internal/completion_queue_impl.h:232 (completion_queue_test+0x583af4)
    #13 MockCompletionQueue /proc/self/cwd/google/cloud/completion_queue_test.cc:30 (completion_queue_test+0x55cb0a)
    #14 void __gnu_cxx::new_allocator<google::cloud::v1::(anonymous namespace)::MockCompletionQueue>::construct<google::cloud::v1::(anonymous namespace)::MockCompletionQueue>(google::cloud::v1::(anonymous namespace)::MockCompletionQueue*) /usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/ext/new_allocator.h:147 (completion_queue_test+0x55cac8)
    #15 void std::allocator_traits<std::allocator<google::cloud::v1::(anonymous namespace)::MockCompletionQueue> >::construct<google::cloud::v1::(anonymous namespace)::MockCompletionQueue>(std::allocator<google::cloud::v1::(anonymous namespace)::MockCompletionQueue>&, google::cloud::v1::(anonymous namespace)::MockCompletionQueue*) /usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/bits/alloc_traits.h:484 (completion_queue_test+0x55c929)
    #16 _Sp_counted_ptr_inplace<> /usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/bits/shared_ptr_base.h:548 (completion_queue_test+0x55c6f8)
    #17 __shared_count<google::cloud::v1::(anonymous namespace)::MockCompletionQueue, std::allocator<google::cloud::v1::(anonymous namespace)::MockCompletionQueue>> /usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/bits/shared_ptr_base.h:680 (completion_queue_test+0x55c5b2)
    #18 __shared_ptr<std::allocator<google::cloud::v1::(anonymous namespace)::MockCompletionQueue>> /usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/bits/shared_ptr_base.h:1344 (completion_queue_test+0x55c553)
    #19 shared_ptr<std::allocator<google::cloud::v1::(anonymous namespace)::MockCompletionQueue>> /usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/bits/shared_ptr.h:359 (completion_queue_test+0x55c500)
    #20 std::shared_ptr<google::cloud::v1::(anonymous namespace)::MockCompletionQueue> std::allocate_shared<google::cloud::v1::(anonymous namespace)::MockCompletionQueue, std::allocator<google::cloud::v1::(anonymous namespace)::MockCompletionQueue>>(std::allocator<google::cloud::v1::(anonymous namespace)::MockCompletionQueue> const&) /usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/bits/shared_ptr.h:701 (completion_queue_test+0x55c4c0)
    #21 std::shared_ptr<google::cloud::v1::(anonymous namespace)::MockCompletionQueue> std::make_shared<google::cloud::v1::(anonymous namespace)::MockCompletionQueue>() /usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/bits/shared_ptr.h:717 (completion_queue_test+0x55c38d)
    #22 google::cloud::v1::(anonymous namespace)::CompletionQueueTest_RunAsyncCompletionQueueDestroyed_Test::TestBody() /proc/self/cwd/google/cloud/completion_queue_test.cc:332 (completion_queue_test+0x57562e)
    #23 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2433 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xe29d7)
    #24 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2469 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xcffa2)
    #25 testing::Test::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2508 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xbf430)
    #26 testing::TestInfo::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2684 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xbfd12)
    #27 testing::TestSuite::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2816 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xc0321)
    #28 testing::internal::UnitTestImpl::RunAllTests() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:5338 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xc9d31)
    #29 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2433 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xe6267)
    #30 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2469 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xd2602)
    #31 testing::UnitTest::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:4925 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0xc98d8)
    #32 RUN_ALL_TESTS() /proc/self/cwd/external/com_google_googletest/googletest/include/gtest/gtest.h:2473 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest_Umain.so+0xdfb)
    #33 main /proc/self/cwd/external/com_google_googletest/googlemock/src/gmock_main.cc:63 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest_Umain.so+0xda8)

SUMMARY: ThreadSanitizer: heap-use-after-free /proc/self/cwd/external/com_github_grpc_grpc/src/core/lib/iomgr/exec_ctx.cc:150 in grpc_core::ExecCtx::Flush()
mr-salty commented 4 years ago

I recall having a similar issue several months ago (which is what led us to add this test IIRC) but thought we had resolved it... i'll see if I can reconstitute my memory, and I guess this can be my flake for the week.

coryan commented 4 years ago

Are we holding the grpc::Alarm incorrectly?

Hard to say, grpc::Alarm is made out of razor blades and hate... seems like any way you hold it you are lucky to keep all your fingers.

In this case we are simulating the completion of an alarm:

https://github.com/googleapis/google-cloud-cpp/blob/9de62a84920fd4a70bceac21a4d8ec117aa5fb41/google/cloud/completion_queue_test.cc#L342

and deleting everything, but that is not guaranteed to work. Alarms must either be canceled or wait until completion, it is not Okay to remove them by yourself.

I will try to fix this.

coryan commented 4 years ago

Oh, I see @mr-salty is already looking. FWIW, I think the right fix is to change that SimulateCompletion() for nothing and call CancelAll() to ensure all the timers have actually finished. I think I resisted automatically calling CancelAll() in the destructor, I think this convinced me I was wrong.

mr-salty commented 4 years ago

if you'd like to take it, that's fine with me... I do remember us having a longish discussion about this a few months ago but I haven't reviewed that yet to see what thoughts we had (can't recall if the discussion was on a PR/bug, or chat, or email...)

coryan commented 4 years ago

Let me take a look. I need a break from channels/senders/receivers/executors.

coryan commented 4 years ago

This was funny... we have a MockCompletionQueue class that specializes CreateAlarm() to workaround the crazy timer rules:

https://github.com/googleapis/google-cloud-cpp/blob/9de62a84920fd4a70bceac21a4d8ec117aa5fb41/google/cloud/testing_util/mock_completion_queue.h#L31-L36

But in the tests for CompletionQueue we used a different mock:

https://github.com/googleapis/google-cloud-cpp/blob/9de62a84920fd4a70bceac21a4d8ec117aa5fb41/google/cloud/completion_queue_test.cc#L30-L33

I am running tests again (it used to take 100,000 iterations for this to fail), but I am pretty sure that is the problem...