open-telemetry / opentelemetry-cpp

The OpenTelemetry C++ Client
https://opentelemetry.io/
Apache License 2.0
810 stars 391 forks source link

Crash in OLTP HTTP export #2713

Open VivekSubr opened 1 week ago

VivekSubr commented 1 week ago

Describe your environment Built and running on linux,

cmake .. -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=ON -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_CXX_STANDARD=17 \
         -DWITH_STL=CXX17 -DBUILD_SHARED_LIBS=ON -DWITH_OTLP_HTTP=ON -DWITH_OTLP_GRPC=ON -DBUILD_TESTING=OFF

Protobuf version installed - 3.17.3

Steps to reproduce Don't have exact steps to reproduce, happens intermittently.

Backtrace

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000001b83c7a7d in ?? ()
[Current thread is 1 (Thread 0x7810b784da00 (LWP 23))]
#0  0x00000001b83c7a7d in ?? ()
#1  0x00007ffc046f68b0 in ?? ()
#2  0x00007810b87a237d in google::protobuf::RepeatedPtrField<opentelemetry::proto::trace::v1::ResourceSpans>::~RepeatedPtrField() ()
   from /lib64/libopentelemetry_exporter_otlp_grpc.so
#3  0x00007810b87a174a in opentelemetry::proto::collector::trace::v1::ExportTraceServiceRequest::~ExportTraceServiceRequest() ()
   from /lib64/libopentelemetry_exporter_otlp_grpc.so
#4  0x00007810b87721ff in opentelemetry::v1::exporter::otlp::OtlpHttpExporter::Export(opentelemetry::v1::nostd::span<std::unique_ptr<--Type <RET> for more, q to quit, c to continue without paging--
opentelemetry::v1::sdk::trace::Recordable, std::default_delete<opentelemetry::v1::sdk::trace::Recordable> >, 18446744073709551615ul> const&) () from /lib64/libopentelemetry_exporter_otlp_http.so
#5  0x00007810ba07113b in opentelemetry::v1::sdk::trace::SimpleSpanProcessor::OnEnd (this=0x6299ed3d66a0, span=...)
    at /usr/include/opentelemetry/sdk/trace/simple_processor.h:51
#6  0x00007810b88cd9ba in opentelemetry::v1::sdk::trace::MultiSpanProcessor::OnEnd(std::unique_ptr<opentelemetry::v1::sdk::trace::Recordable, std::default_delete<opentelemetry::v1::sdk::trace::Recordable> >&&) () from /lib64/libopentelemetry_trace.so
#7  0x00007810b88d6654 in opentelemetry::v1::sdk::trace::Span::End(opentelemetry::v1::trace::EndSpanOptions const&) ()
   from /lib64/libopentelemetry_trace.so

Additional Info

Crash appears to be on destruction of arena object in, https://github.com/open-telemetry/opentelemetry-cpp/blob/main/exporters/otlp/src/otlp_http_exporter.cc#L102

It's not apparent why this might happen... any help will be appreciated.

owent commented 6 days ago

What's your version of otel-cpp and do you enable async exporting? There was a thread safety problem before 1.10.0 in OTLP HTTP exporter when otel-cpp is built without async export(Without -DENABLE_ASYNC_EXPORT or WITH_ASYNC_EXPORT_PREVIEW).

VivekSubr commented 5 days ago

@owent - 1.15, haven't enabled async exporting... is async export still in preview in 1.15?

owent commented 5 days ago

@owent - 1.15, haven't enabled async exporting... is async export still in preview in 1.15?

gRPC async exporting is still in preview.

owent commented 2 days ago

Does this problem happens when shuting down? Do you compile both otel-cpp and proto as dynamic library?Just wondering why the destructor of RepeatedPtrField<opentelemetry::proto::trace::v1::ResourceSpans> is in gRPC exporter.

VivekSubr commented 2 days ago

It's HTTP exporter, and proto is from yum install.

We're investigating if it's memory corruption from somewhere else.

owent commented 2 days ago

It's HTTP exporter, and proto is from yum install.

We're investigating if it's memory corruption from somewhere else.

Do you mean protobuf? I reviewed the codes and found the messages and arena will not leave the scope of OtlpHttpExporter::Export in my understanding.

owent commented 10 hours ago

I found another crash in #2982 when using metrics and timeout happens. Not sure if it relates this one.