open-telemetry / opentelemetry-cpp

The OpenTelemetry C++ Client
https://opentelemetry.io/
Apache License 2.0
890 stars 427 forks source link

Data race when getting Meter from MeterProvider #3072

Open JavierBejMen opened 2 months ago

JavierBejMen commented 2 months ago

Describe your environment gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 opentelemetry-cpp 1.8.3#6 Same behaviour on latest too.

Steps to reproduce

    // Exporter
    auto exporter = std::make_unique<opentelemetry::exporter::metrics::OStreamMetricExporter>(std::cout);

    // Reader
    auto readerOptions = opentelemetry::sdk::metrics::PeriodicExportingMetricReaderOptions();
    readerOptions.export_interval_millis = std::chrono::milliseconds(std::chrono::milliseconds(100));
    readerOptions.export_timeout_millis = std::chrono::milliseconds(std::chrono::milliseconds(33));
    auto reader = std::make_shared<opentelemetry::sdk::metrics::PeriodicExportingMetricReader>(
        std::move(exporter), readerOptions);

    // Provider
    auto provider = std::make_shared<opentelemetry::sdk::metrics::MeterProvider>();
    provider->AddMetricReader(reader);
    provider->GetMeter("default");

What is the expected behavior? I expect that thread sanitizer reports no warnings.

What is the actual behavior? Getting to data races on shared ptr when calling provider->GetMeter("default")

$4│ ==================
$4│ WARNING: ThreadSanitizer: data race (pid=114998)
$4│   Read of size 8 at 0x7b0400001b08 by thread T3:
$4│     #0 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/11/bits/shared_ptr_base.h:709 (metrics_utest+0x537c70)
$4│     #1 std::__shared_ptr<opentelemetry::v1::sdk::metrics::Meter, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<opentelemetry::v1::sdk::metrics::Meter, (__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/11/bits/shared_ptr_base.h:1152 (metrics_utest+0x665c04)
$4│     #2 __gthread_once /usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:700 (metrics_utest+0x6c80db)
$4│ 
$4│   Previous write of size 8 at 0x7b0400001b08 by main thread (mutexes: write M119):
$4│     #0 operator new(unsigned long) ../../../../src/libsanitizer/tsan/tsan_new_delete.cpp:64 (libtsan.so.0+0x8f162)
$4│     #1 __gnu_cxx::new_allocator<std::shared_ptr<opentelemetry::v1::sdk::metrics::Meter> >::allocate(unsigned long, void const*) /usr/include/c++/11/ext/new_allocator.h:127 (metrics_utest+0x69ed61)
$4│ 
$4│ ⬆ std::cerr
$4│ ⬇ std::cerr:
$4│     #2 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (metrics_utest+0x639578)
$4│     #3 main /home/bee/Project/wazuh/src/engine/source/metrics/test/src/unit/main.cpp:27 (metrics_utest+0x5f7d6d)
$4│ 
$4│   Location is heap block of size 16 at 0x7b0400001b00 allocated by main thread:
$4│     #0 operator new(unsigned long) ../../../../src/libsanitizer/tsan/tsan_new_delete.cpp:64 (libtsan.so.0+0x8f162)
$4│     #1 __gnu_cxx::new_allocator<std::shared_ptr<opentelemetry::v1::sdk::metrics::Meter> >::allocate(unsigned long, void const*) /usr/include/c++/11/ext/new_allocator.h:127 (metrics_utest+0x69ed61)
$4│     #2 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (metrics_utest+0x639578)
$4│     #3 main /home/bee/Project/wazuh/src/engine/source/metrics/test/src/unit/main.cpp:27 (metrics_utest+0x5f7d6d)
$4│ 
$4│   Mutex M119 (0x7b1400000348) created at:
$4│     #0 pthread_mutex_lock ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4240 (libtsan.so.0+0x53908)
$4│     #1 __gthread_mutex_lock /usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749 (metrics_utest+0x588cad)
$4│     #2 std::mutex::lock() /usr/include/c++/11/bits/std_mutex.h:100 (metrics_utest+0x588d36)
$4│     #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) /usr/include/c++/11/bits/std_mutex.h:229 (metrics_utest+0x592ae4)
$4│     #4 opentelemetry::v1::sdk::metrics::MeterProvider::GetMeter(opentelemetry::v1::nostd::string_view, opentelemetry::v1::nostd::string_view, opentelemetry::v1::nostd::string_view) /home/bee/engine/vcpkg/buildtrees/opentelemetry-cpp/src/v1.8.3-b38e6ca96f.clean/sdk/src/metrics/meter_provider.cc:42 (metrics_utest+0x664721)
$4│     #5 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (metrics_utest+0x639578)
$4│     #6 main /home/bee/Project/wazuh/src/engine/source/metrics/test/src/unit/main.cpp:27 (metrics_utest+0x5f7d6d)
$4│ 
$4│   Thread T3 (tid=115002, running) created by thread T1 at:
$4│     #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:969 (libtsan.so.0+0x605b8)
$4│     #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xdc328)
$4│ 
$4│ SUMMARY: ThreadSanitizer: data race /usr/include/c++/11/bits/shared_ptr_base.h:709 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&)
$4│ ==================
$4│ 
$4│ ⬆ std::cerr
$4│ ⬇ std::cerr:
$4│ ==================
$4│ WARNING: ThreadSanitizer: data race (pid=114998)
$4│   Atomic write of size 4 at 0x7b0800000968 by thread T3:
$4│     #0 __tsan_atomic32_fetch_add ../../../../src/libsanitizer/tsan/tsan_interface_atomic.cpp:615 (libtsan.so.0+0x81fe9)
$4│     #1 __gnu_cxx::__atomic_add(int volatile*, int) /usr/include/c++/11/ext/atomicity.h:71 (metrics_utest+0x540524)
$4│     #2 __gnu_cxx::__atomic_add_dispatch(int*, int) /usr/include/c++/11/ext/atomicity.h:111 (metrics_utest+0x540524)
$4│     #3 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy() /usr/include/c++/11/bits/shared_ptr_base.h:148 (metrics_utest+0x540524)
$4│     #4 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/11/bits/shared_ptr_base.h:712 (metrics_utest+0x537cbd)
$4│     #5 std::__shared_ptr<opentelemetry::v1::sdk::metrics::Meter, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<opentelemetry::v1::sdk::metrics::Meter, (__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/11/bits/shared_ptr_base.h:1152 (metrics_utest+0x665c04)
$4│     #6 __gthread_once /usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:700 (metrics_utest+0x6c80db)
$4│ 
$4│   Previous write of size 8 at 0x7b0800000968 by main thread (mutexes: write M119):
$4│     #0 operator new(unsigned long) ../../../../src/libsanitizer/tsan/tsan_new_delete.cpp:64 (libtsan.so.0+0x8f162)
$4│     #1 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<opentelemetry::v1::sdk::metrics::Meter*>(opentelemetry::v1::sdk::metrics::Meter*) <null> (metrics_utest+0x66967b)
$4│     #2 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (metrics_utest+0x639578)
$4│     #3 main /home/bee/Project/wazuh/src/engine/source/metrics/test/src/unit/main.cpp:27 (metrics_utest+0x5f7d6d)
$4│ 
$4│   Location is heap block of size 24 at 0x7b0800000960 allocated by main thread:
$4│     #0 operator new(unsigned long) ../../../../src/libsanitizer/tsan/tsan_new_delete.cpp:64 (libtsan.so.0+0x8f162)
$4│     #1 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<opentelemetry::v1::sdk::metrics::Meter*>(opentelemetry::v1::sdk::metrics::Meter*) <null> (metrics_utest+0x66967b)
$4│     #2 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (metrics_utest+0x639578)
$4│     #3 main /home/bee/Project/wazuh/src/engine/source/metrics/test/src/unit/main.cpp:27 (metrics_utest+0x5f7d6d)
$4│ 
$4│   Mutex M119 (0x7b1400000348) created at:
$4│     #0 pthread_mutex_lock ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4240 (libtsan.so.0+0x53908)
$4│     #1 __gthread_mutex_lock /usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749 (metrics_utest+0x588cad)
$4│     #2 std::mutex::lock() /usr/include/c++/11/bits/std_mutex.h:100 (metrics_utest+0x588d36)
$4│     #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) /usr/include/c++/11/bits/std_mutex.h:229 (metrics_utest+0x592ae4)
$4│     #4 opentelemetry::v1::sdk::metrics::MeterProvider::GetMeter(opentelemetry::v1::nostd::string_view, opentelemetry::v1::nostd::string_view, opentelemetry::v1::nostd::string_view) /home/bee/engine/vcpkg/buildtrees/opentelemetry-cpp/src/v1.8.3-b38e6ca96f.clean/sdk/src/metrics/meter_provider.cc:42 (metrics_utest+0x664721)
$4│     #5 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (metrics_utest+0x639578)
$4│     #6 main /home/bee/Project/wazuh/src/engine/source/metrics/test/src/unit/main.cpp:27 (metrics_utest+0x5f7d6d)
$4│ 
$4│   Thread T3 (tid=115002, running) created by thread T1 at:
$4│     #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:969 (libtsan.so.0+0x605b8)
$4│     #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xdc328)
$4│ 
$4│ SUMMARY: ThreadSanitizer: data race /usr/include/c++/11/ext/atomicity.h:71 in __gnu_cxx::__atomic_add(int volatile*, int)
$4│ ==================

Additional context When setting the GlobalProvider there are no warnings, but when accessing through the provider directly it does. I don't know if I'm doing something wrong or if it is an internal bug from the SDK, if you could help me I would appreciate it!

marcalff commented 1 month ago

A data race inside the implementation of std::shared_ptr is very unlikely.

Please make sure the build is clean, and does not pick up somehow a binary from a old or different version of the C++ standard library.

The symptom sounds related to: