open-telemetry / opentelemetry-cpp

The OpenTelemetry C++ Client
https://opentelemetry.io/
Apache License 2.0
858 stars 410 forks source link

Hitting issue of large alloc which crashes application #3090

Closed yzhuang93 closed 1 week ago

yzhuang93 commented 1 week ago

Recently bumped up our OTEL CPP client version to 1.16.1 and also use STL instead of ABSEIL, but hit into this issue at run time:

tcmalloc: large alloc 139896905007104 bytes == (nil) @  0x7f3c4a43154c 0x7f3c426154b0 0x7f3c492c6db3 0x7f3c490c5dd4 0x7f3c48bb44b6 0x7f3c495180c8 0x7f3c49517616 0x7f3c4a30ab2e 0x7f3c4a302825 0x7f3c4a306295 0x7f3c41e6c7e5
libc++abi: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc
*** Received signal: SIGABRT on thread 3663822 fault-address 0x3e80037e7ce fault-code -6 @ 1728527647 (unix time) try "date -d @1728527647" ***
#0    Object "/lib64/libc.so.6", at 0x7f3c41e8052f, in gsignal
#1    Object "/lib64/libc.so.6", at 0x7f3c41e53e64, in abort
#2    Source "/home/build/rpmbuild.tmpWLNW9Q/BUILD/cross-el8.5-x86_64-clang-runtime-13.0.1-13.0.1/llvm-project-13.0.1.src/libcxxabi/src/abort_message.cpp", line 78, in abort_message(char *format) [0x7f3c4267fb25]
#3    Source "/home/build/rpmbuild.tmpWLNW9Q/BUILD/cross-el8.5-x86_64-clang-runtime-13.0.1-13.0.1/llvm-project-13.0.1.src/libcxxabi/src/cxa_default_handlers.cpp", line 66, in demangling_terminate_handler() [0x7f3c4266805e]
#4    Source "/home/build/rpmbuild.tmpWLNW9Q/BUILD/cross-el8.5-x86_64-clang-runtime-13.0.1-13.0.1/llvm-project-13.0.1.src/libcxxabi/src/cxa_handlers.cpp", line 59, in std::__terminate(*terminate_handler func) [0x7f3c4267ecd2]
#5    Source "/home/build/rpmbuild.tmpWLNW9Q/BUILD/cross-el8.5-x86_64-clang-runtime-13.0.1-13.0.1/llvm-project-13.0.1.src/libcxxabi/src/cxa_handlers.cpp", line 88, in std::terminate() [0x7f3c4267ec77]
#6    Object "/home/builds/tracing-otlp-build/lib/libopentelemetry_trace.so", at 0x7f3c48b9e8ba, in __clang_call_terminate
#7    Source "/home/workplace/main/views/xxx/builds/xxxx/observability_impl.cc", line 229, in InitTracer() [0x7f3c495180c7]

The line of code hitting this issue is tracer_ = trace::Provider::GetTracerProvider()->GetTracer("otel", "1.16.1"); If I change back to use abseil then I don't see this issue, but I have some other problem of using abseil and need to get rid of it.

yzhuang93 commented 1 week ago

Adding some more info about the code

  // Create exporter with [OtlpHttp]ExporterFactory.
  otlp::OtlpHttpExporterOptions opts;
  opts.url = config_->ExporterEndpoint();
  auto exporter  = otlp::OtlpHttpExporterFactory::Create(opts);

  // Create Batch Span Processor with [Batch]SpanProcessorFactory.
  sdktrace::BatchSpanProcessorOptions processor_option{};
  auto processor = sdktrace::BatchSpanProcessorFactory::Create(move(exporter),
    processor_option);

  // Create Sampler with [AlwaysOn]SamplerFactory.
  auto sampler = sdktrace::AlwaysOnSamplerFactory::Create();

  // Create Resource with info service_name, hostname and ip.
  auto resource = sdkresource::Resource::Create({
    {"service.name", service_name_},
    {"hostname", hostname_},
    {"ip", ip_address_}
  });

  // Create ID Generator with [Random]IdGeneratorFactory.
  auto id_generator = sdktrace::RandomIdGeneratorFactory::Create();

  // Create Tracer Provider with TracerProviderFactory.
  shared_ptr<sdktrace::TracerProvider> tracer_provider
    = sdktrace::TracerProviderFactory::Create(
        move(processor), resource,
        move(sampler), move(id_generator));

  // Set global tracer provider.
  shared_ptr<trace::TracerProvider> api_provider = tracer_provider;
  trace::Provider::SetTracerProvider(api_provider);

  // Get Tracer From TracerProvider.
  tracer_ = trace::Provider::GetTracerProvider()->GetTracer("otel", "1.16.1");
owent commented 1 week ago

Do you use cmake and how do you import otel-cpp. Simular problems may happen when the macros are different between buiding otel-cpp and using it.

yzhuang93 commented 1 week ago

CMake to build otel:

   -DCMAKE_CXX_FLAGS="-std=c++17" \
   -DBUILD_SHARED_LIBS=ON \
   -DBUILD_TESTING=OFF \
   -DWITH_EXAMPLES=OFF \
   -DCMAKE_POSITION_INDEPENDENT_CODE=ON \
   -DWITH_OTLP_HTTP=ON \
   -DWITH_STL=ON \
   -DCMAKE_CXX_STANDARD=17 \
   -DOPENTELEMETRY_ABI_VERSION_NO=1 \
   -DCMAKE_SKIP_RPATH=1

Import:

#ifndef _UTIL_OBSERVABILITY_H_
#define _UTIL_OBSERVABILITY_H_

#include "opentelemetry/context/propagation/global_propagator.h"
#include "opentelemetry/context/propagation/text_map_propagator.h"
#include "opentelemetry/exporters/otlp/otlp_http_exporter.h"
#include "opentelemetry/exporters/otlp/otlp_http_exporter_factory.h"
#include "opentelemetry/nostd/shared_ptr.h"
#include "opentelemetry/sdk/common/env_variables.h"
#include "opentelemetry/sdk/resource/resource_detector.h"
#include "opentelemetry/sdk/trace/batch_span_processor.h"
#include "opentelemetry/sdk/trace/batch_span_processor_factory.h"
#include "opentelemetry/sdk/trace/random_id_generator_factory.h"
#include "opentelemetry/sdk/trace/samplers/always_on_factory.h"
#include "opentelemetry/sdk/trace/simple_processor.h"
#include "opentelemetry/sdk/trace/tracer_provider.h"
#include "opentelemetry/sdk/trace/tracer_provider_factory.h"
#include "opentelemetry/trace/noop.h"
#include "opentelemetry/trace/propagation/http_trace_context.h"
#include "opentelemetry/trace/propagation/jaeger.h"
#include "opentelemetry/trace/provider.h"
#include "opentelemetry/trace/span_metadata.h"

...

#endif
yzhuang93 commented 1 week ago

I just resolved this by turn both STL and Abseil OFF, I though I need to choose one from them, turns out that build without neither of them bypass this issue.