open-telemetry / opentelemetry-ruby-contrib

Contrib Packages for the OpenTelemetry Ruby API and SDK implementation.
https://opentelemetry.io
Apache License 2.0
79 stars 160 forks source link

Streaming responses causes spans to be orphaned #474

Open KaoruDev opened 1 year ago

KaoruDev commented 1 year ago

Description of the bug I'm using https://github.com/activeadmin/activeadmin/tree/v2.9.0 to allow for downloading CSVs. Digging into their source code, I noticed the response is streamed back to the client. I suspect the Rack OpenTelemtery gem is closing the root span without waiting for the request body io stream to finish, because I'm collecting orphaned spans.

Share details about your runtime

Operating system details: Mac OSX 12 RUBY_ENGINE: "ruby" RUBY_VERSION: "3.1.3" RUBY_DESCRIPTION: "ruby 3.1.3p185 (2022-11-24 revision 1a6b16756e) [arm64-darwin21]"

Share a simplified reproduction if possible

I suspect setting up an ActiveAdmin app with CSV downloads + OpenTelemetry might be enough. If a demo app is required, I'll have to create one later.

KaoruDev commented 1 year ago

I was able to solve the problem by manually managing the spans and context and wrapping the response with a Rack::BodyProxy allowing me to close everything up when the stream was also closed.

TBH, it's hideous but it works, POC:

tokens = []
begin
  tokens << ::OpenTelemetry::Context.attach(
    frontend_context || extracted_context,
  )
  request_span =
    tracer.start_span(
      request_span_name,
      attributes: request_span_attributes(env: env),
      kind: request_span_kind,
    )
  request_span_context =
    ::OpenTelemetry::Trace.context_with_span(request_span)
  tokens.unshift(
    ::OpenTelemetry::Context.attach(request_span_context),
  )
  rack_span_context =
    ::OpenTelemetry::Instrumentation::Rack.context_with_span(
      request_span,
    )
  tokens.unshift(::OpenTelemetry::Context.attach(rack_span_context))
  status, headers, response = @app.call(env)

  proxied_body =
    ::Rack::BodyProxy.new(response) do
      tokens.each { |token| ::OpenTelemetry::Context.detach(token) }
      set_attributes_after_request(
        request_span,
        status,
        headers,
        response,
      )
      request_span.finish
    end
  [status, headers, proxied_body]
rescue Exception => e
  request_span&.record_exception(e)
  request_span&.finish
  tokens.each { |token| ::OpenTelemetry::Context.detach(token) }
  raise
end

I'd be happy to submit a PR if yall think this is a good direction.

arielvalentin commented 1 year ago

The latest version of the Rack instrumentation attempts to address this issue by using the Rack::Events API instead of Rack Middleware.

There's an option you may set use_rack_events: true to enable it.

I haven't had the opportunity yet to run this under a production workload so I'm not sure how it performs.

If you're willing to give this a try and report back any issues I'd appreciate it!

KaoruDev commented 1 year ago

Ah very nice, I need to familiarize myself with the Events API. I'll take a gander at your PR: https://github.com/open-telemetry/opentelemetry-ruby-contrib/pull/342 -- If you have any other resources please send them over 🙏

github-actions[bot] commented 1 year ago

👋 This issue has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this issue will be closed eventually by the stale bot.

KaoruDev commented 6 months ago

Sorry I fat-fingered the wrong button 😞


hey @arielvalentin I tried using Rack::Events and discovered we're missing some labels.

Let me explain...

The way I've set up our middleware chain is like so:

OpenTelemetry::Instrumentation::Rack::Middlewares::TracerMiddleware
Custom::Labeler

This way we can add labels to the spans without changing our application codebase. However, it appears the labels added by our Custom::Labeler are now missing. Any ideas why that might be the case?

I went ahead and reverted to using rack_events: false and the custom code I posted above.