open-telemetry / opentelemetry-cpp

The OpenTelemetry C++ Client
https://opentelemetry.io/
Apache License 2.0
844 stars 404 forks source link

Error while exporting Metrics #2391

Open Veeraraghavans opened 10 months ago

Veeraraghavans commented 10 months ago

Hello team,

I'm trying to use Opentelemetry Cpp version 1.8.1 to export my metrics from Ubuntu 22.04 machine . The plugin code that creates the agents, the provider to export the metrics. When I try to create the metrics provider, I get an allocation error. I'm not sure what's causing this error.

terminate called after throwing an instance of '
std::bad_alloc'
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc  what():  std::bad_alloc

I did some analysis using gdbgui to detail the problem and found that when MetaDataValidator is called, it triggers this regex and allocator validation and fails.

image

It would be nice if anyone has some idea on it. I am stuck on this for a while any inputs would be welcome. Happy to provide more details if needed

lalitb commented 10 months ago

@Veeraraghavans Which compiler? Also, do you have the sample code which is failing?

Veeraraghavans commented 10 months ago

Hi @lalitb

I use compiler version of gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0. Here is snippet of code which I use to create Meter

nostd::shared_ptr<metrics_api::Meter> MetricAgent::GetMeter()
{
    auto provider = metrics_api::Provider::GetMeterProvider();
   return provider->GetMeter(this->serviceName, OPENTELEMETRY_SDK_VERSION);
}

More information:

When I call GetMeter it calls Get Meter from MeterProvider. During creation of Meter in Opentelemetry, It calls InstrumentDataValidator where the regex error is thrown.

std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::basic_regex<std::char_traits<char>, std::allocator<char> >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::regex_constants::syntax_option_type)

image

Please let me know is the information shared is enough or you need more.

lalitb commented 10 months ago

@Veeraraghavans - Do you get a similar crash while running - https://github.com/open-telemetry/opentelemetry-cpp/tree/main/examples/metrics_simple? Also, what is the otel-cpp version you are using? If it is from the main branch, do you also see the crash with v1.12.0?

Veeraraghavans commented 10 months ago

No I am not getting crash. I could run the Metrics_Simple example which you shared and Opentelemetry version i use is 1.8.1. I use Opentelemetry branch of 1.8.1

lalitb commented 10 months ago

Sorry, the stack trace is not enough for me to debug further. I can't see why allocation should fail in regex init. In case, someone want to comment/debug. Else, it would be helpful if you can provide a sample code (not the snippet) which fails consistently.

Veeraraghavans commented 10 months ago

@lalitb thanks for your reply. You have some idea about common reason for allocation failure at regex init. I can share the part of the code which fails as it is propriety code. I will check on giving access.

lalitb commented 10 months ago

@Veeraraghavans It would be more helpful if you could share the example ( in similar lines to https://github.com/open-telemetry/opentelemetry-cpp/tree/main/examples/metrics_simple ) which crashes on regex init. Something that can be easily compilable and reproducible to debug further.

Veeraraghavans commented 10 months ago

@lalitb please find the code which crashing during execution. Code has 2 parts one is

plugin.cpp - is the main code which creates the resources, Metric Agent.

#include "agents/MetricAgent.h"
void main()
{
    int processID = GetProcessID();
    //Create opentelemetry-cpp Resource to attach it to the telemetry data
    resource::ResourceAttributes attributes = {{"service.name", "ABC_PLUGIN"}, {"version", "latest"}, {"process_id", GetProcessID()}};   
    auto resource = resource::Resource::Create(attributes);
    std::string endpoint = "localhost:4317/v1/metrics";
    static ObservabilityPlugin::MetricAgent metricAgent( ABC_PLUGIN, GRPC, endpoint, resource);
    metricAgent.ActivateMetricType(ObservabilityPlugin::DefaultMetrics::All);  // Code calls ActivateMetricType function in MetricAgent.cpp 

}

// Get process Id:

int GetProcessID(){
  C_Communicator* com = C_Communicator::Instance();
  if(com == nullptr) return 0;
  if(com && com->size() > 1)  
    return com->cpuNum();
  else                        
    return 0;
}

agents/MetricAgent.cpp code - Creates Metric Exporter and Provider.

MetricAgent::MetricAgent(const std::string& serviceName, const std::string& protocol, const std::string& endpoint, resource::Resource resource, unsigned int frequency)
{
    this->serviceName = serviceName;
    auto attr = resource.GetAttributes();
    auto it = attr.find("process_id");
    if(it != attr.end()){
        this->processID = nostd::get<int>(it->second);
    }
    std::unique_ptr<metric_sdk::PushMetricExporter> exporter;
    this->metricGRPCExporterOptions.aggregation_temporality = metric_sdk::AggregationTemporality::kCumulative;
    this->metricGRPCExporterOptions.endpoint = endpoint;
    exporter = otlp::OtlpGrpcMetricExporterFactory::Create(metricGRPCExporterOptions);
    metric_sdk::PeriodicExportingMetricReaderOptions metricReaderOptions;
    metricReaderOptions.export_interval_millis = std::chrono::milliseconds(frequency);
    metricReaderOptions.export_timeout_millis  = std::chrono::milliseconds(frequency/2);
    std::unique_ptr<metric_sdk::MetricReader> reader{new metric_sdk::PeriodicExportingMetricReader(std::move(exporter), metricReaderOptions)};
    auto provider = std::shared_ptr<metrics_api::MeterProvider>(new metric_sdk::MeterProvider(std::unique_ptr<metric_sdk::ViewRegistry>(new metric_sdk::ViewRegistry()), resource));
   auto p        = std::static_pointer_cast<metric_sdk::MeterProvider>(provider);
   p->AddMetricReader(std::move(reader));   
   metrics_api::Provider::SetMeterProvider(provider);
}

// Function calls Metrics Meter Provider for adding Metrics counters
void MetricAgent::ActivateMetricType(DefaultMetrics type)
{
    auto meter = this->GetMeter();
    //This is place where the error is thrown where GetMeter function is called from plugin.cpp 
    switch (type)
    {
        //......
    }
}

nostd::shared_ptr<metrics_api::Meter> MetricAgent::GetMeter()
{
    auto provider = metrics_api::Provider::GetMeterProvider();
    return provider->GetMeter(this->serviceName, OPENTELEMETRY_SDK_VERSION);
}
marcalff commented 10 months ago

Given how the regexp crashes on the name given to GetMeter(), what is the actual value of serviceName ?

Does it looks properly initialized ?

Veeraraghavans commented 10 months ago

It gets following values, serviceName="abc_plugin" in the example andOPENTELEMETRY_SDK_VERSION=1.8.1. I think it is initialized fine as MetricAgent::MetricAgent(const std::string& serviceName, const std::string& protocol, const std::string& endpoint, resource::Resource resource, unsigned int frequency) executed fine but when I call Getmeter i have issues.

Do we have some methods to check on logs or some ways to check what happens ?

Veeraraghavans commented 10 months ago

Hey @lalitb @marcalff,

You think the usage of D_GLIBCXX_USE_CXX11_ABI flag will create issue?? Or any other reason you managed to get some idea. Any inputs will be helpful

Veeraraghavans commented 9 months ago

Hi @marcalff @lalitb

Did you get any idea on it? I tried debugging using SDK, The error is taking place at Regex Validation the value is passed exactly is "mapdl_plugin" and "1.8.1" when I disable it code proceeds but fails at Meter Creation counter.

[Error] File: /home/vsekar/observability-plugins/source/opentelemetry-cpp-v1.8/sdk/src/metrics/meter.cc:46Meter::CreateUInt64Counter - failed. Invalid parameters.mapdl_plugin_counter_nb_of_processes Number of processes . Measurements won't be recorded.

Entire code works fine for other example but fails if i call from my plugin code.

github-actions[bot] commented 7 months ago

This issue was marked as stale due to lack of activity.