open-telemetry / opentelemetry-erlang-contrib

OpenTelemetry instrumentation for Erlang & Elixir
https://opentelemetry.io
Apache License 2.0
139 stars 109 forks source link

cowboy instrumentation breaks cowboy_metrics_h callback #304

Open RoadRunnr opened 3 months ago

RoadRunnr commented 3 months ago

Describe the bug

When using the cowboy_telemetry_h and cowboy_metrics_h handler together, the metrics cb fails with:

2024-03-25T12:25:21.637585+01:00 <0.3469.0> telemetry:execute/3 : error: Handler opentelemetry_cowboy_handlers has failed and has been detached. Class=error
Reason={case_clause,[]}
Stacktrace=[{otel_telemetry,pop_from_tracer_stack,1,
                [{file,
                     "/usr/src/erlang/erccn/_build/default/lib/opentelemetry_telemetry/src/otel_telemetry.erl"},
                 {line,110}]},
            {otel_telemetry,end_telemetry_span,2,
                [{file,
                     "/usr/src/erlang/erccn/_build/default/lib/opentelemetry_telemetry/src/otel_telemetry.erl"},
                 {line,49}]},
            {opentelemetry_cowboy,handle_event,4,
                [{file,
                     "/usr/src/erlang/erccn/_build/default/lib/opentelemetry_cowboy/src/opentelemetry_cowboy.erl"},
                 {line,79}]},
            {telemetry,'-execute/3-fun-0-',4,
                [{file,
                     "/usr/src/erlang/erccn/_build/default/lib/telemetry/src/telemetry.erl"},
                 {line,160}]},
            {lists,foreach_1,2,[{file,"lists.erl"},{line,1686}]},
            {cowboy_metrics_h,terminate,3,
                [{file,
                     "/usr/src/erlang/erccn/_build/default/lib/cowboy/src/cowboy_metrics_h.erl"},
                 {line,302}]},
            {cowboy_stream,terminate,3,
                [{file,
                     "/usr/src/erlang/erccn/_build/default/lib/cowboy/src/cowboy_stream.erl"},
                 {line,138}]},
            {cowboy_http2,terminate_stream_handler,4,
                [{file,
                     "/usr/src/erlang/erccn/_build/default/lib/cowboy/src/cowboy_http2.erl"},
                 {line,1334}]}]

Additionally, the cowboy_telemetry_h unconditionally overwrites the metrics_callback. That fact is neither documented nor desirable.

Expected behavior not failing

Additional context

cowboy setup:

    ProtoOpts =
        #{env => #{dispatch => Dispatch},
          logger => logger,
          metrics_callback => fun prometheus_cowboy2_instrumenter:observe/1,
          stream_handlers => [cowboy_telemetry_h, cowboy_metrics_h, cowboy_stream_h]
         },
    cowboy:start_clear(?MODULE, TransOpts, ProtoOpts).
bryannaegele commented 3 months ago

Hey @RoadRunnr, this repository is not the home of that library. Please raise the issue in its repo for assistance https://github.com/beam-telemetry/cowboy_telemetry

Also, please include a working example and failing test

tsloughter commented 3 months ago

@bryannaegele but the stacktrace shows otel_telemetry crashing?

bryannaegele commented 3 months ago

@bryannaegele but the stacktrace shows otel_telemetry crashing?

The later comments and errors higher up the stacktrace are suggesting the issue is running both stream handlers, so the issue would be in that library.

using the cowboy_telemetry_h and cowboy_metrics_h handler together, the metrics cb fails

And the offending conflict appears to be happening in the setup with some prometheus callback that's being run from another library.

metrics_callback => fun prometheus_cowboy2_instrumenter:observe/1,
stream_handlers => [cowboy_telemetry_h, cowboy_metrics_h, cowboy_stream_h]

opentelemetry_cowboy does nothing with stream handlers or anything else of that matter. It's just listening on telemetry events.

RoadRunnr commented 3 months ago

@bryannaegele the root cause is the way how cowboy_telemetry_h abuses cowboy_metrics_h, but the crash is in otel_telemetry:pop_from_tracer_stack/1 in this repo. The sister method otel_telemetry:peek_from_tracer_stack/1 does have special support for an empty context stack. So why would the others methods not have that too??

It's just listening on telemetry events.

And its exactly those events that cause a crash. I would think that listening to events not cause a crash like that.

bryannaegele commented 3 months ago

If you want to open a PR, we can accept that. But this error means you are ending a span early and your trace will be missing the root span.