open-telemetry / opentelemetry-erlang

OpenTelemetry Erlang SDK
https://opentelemetry.io
Apache License 2.0
330 stars 105 forks source link

Unable to update root span attributes #570

Open JohnDoneth opened 1 year ago

JohnDoneth commented 1 year ago

When working with DataDog and OpenTelemetry they expect certain attributes to be on the root span in order for the trace to show up in their error reporting segment of their APM application rather than just a trace with some errors contained.

I don't see a way to get a span context for the root span so I can update the root span attributes. Due to how opentelemetry_phoenix creates the root spans for us we can't keep track of the root span context during creation so it would be ideal if we could access the currently bound root span or somehow walk up parent spans for the current span.

Another reason this could be helpful is to add arbitrary attributes during the runtime of the application to our root Phoenix span such as the request ID, current user ID, etc, that aren't currently populated by the opentelemetry_phoenix library. As of the moment, those attributes would be hidden on a subspan which would be trickier to find and view.

tsloughter commented 1 year ago

When are you trying to set the attributes? I would think that it is your root span you have access to in your plug or controller, or is it that the root span is from Cowboy?

Are all the attributes you want to set from the headers? While you should be able to do what you are asking, we also need support for adding headers as attributes on start.

JohnDoneth commented 1 year ago

I would think that it is your root span you have access to in your plug or controller, or is it that the root span is from Cowboy?

That's true if we don't add another span for the plug itself then I could add the attributes to the current span in a plug which would also be the root span for Phoenix. I think that should work for that use-case!

Unfortunately that doesn't solve the other concern with being able to set the root span attributes necessary for DataDog to mark the trace as an error. Described here which requires error.stack, error.message, and error.type to be set on the root span. Due to the nature of some of our errors we'd want to be able to set this at any point in the program which could occur in a deeply nested span so that DataDog recognizes them as traces that should show up in their error tracker.

bryannaegele commented 1 year ago

Most things can be set within a plug or the controller as noted. Based on the link you posted and error tracking, it sounds like this is an issue with Datadog failing to honor the standard. This kind of thing is not new with them but unfortunately isn't something that's an otel issue. I'm surprised they require this behavior.

The only thing I can suggest is you could set up some kind of registry in your code where you can track root spans, but what they seem to be requiring here is against the otel specification.

tsloughter commented 1 year ago

Had a thought. I don't know enough about plug/phoenix to say how this might work but I was thinking about how I might handle similar in Elli.

If there is a way to setup a handler for when there is an error case then all you'd need is to keep the SpanCtx of the root span in the metadata (I guess conn?).

Then in the handler you grab that SpanCtx from there and set the attributes.

So basically, track the root in conn?

tsloughter commented 1 year ago

And yea, as @bryannaegele mentions it'd be nice if Datadog supported https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/exceptions.md#attributes -- which the instrumentation libraries do set. But obviously that is a wish and not a solution :), hopefully we can make it work anyway.

bryannaegele commented 1 year ago

Yeah but the conn isn't passed anywhere past the controller handler so you couldn't grab it from elsewhere. You could stick it in pdict but then if you cross processes then same issue.

The other thing here is no other instrumentation library is going to support this. An ecto error isn't going to be able to be grabbed and set on some root span, so this is kind of a non-starter unfortunately.

tsloughter commented 1 year ago

@JohnDoneth oh, just realized also, are you using the OTLP exporter to the Collector and then having that export to Datadog? Or are you sending with the Zipkin exporter?

tsloughter commented 1 year ago

@bryannaegele my assumption was that any error had to get back up to the top level to return a proper response, so at that point it is available with conn.

JohnDoneth commented 1 year ago

@tsloughter we're using the OpenTelemetry collector with its DataDog exporter.

tsloughter commented 1 year ago

@bryannaegele I mean, that must be how it is setting the error attributes already, right?

bryannaegele commented 1 year ago

I mean, if it's uncaught and a 500 is returned, the error would already be getting set. If the code is just returning error tuples and you decide to return a different response code then any exception info isn't going to be included unless you're passing exception structs around. But again, let's say you're making an http call out and get a 500 or retry for success, that call is errored but can still result in a successful operation in general..

bryannaegele commented 1 year ago

Also, we may be conflating errored spans and exceptions. Not all errored spans are going to be recorded exceptions and have stacktraces, etc

tsloughter commented 1 year ago

But any errored span could be made to throw an exception. I was thinking how in Elli I have middlewares and there are handlers for exceptions thrown with the type and stacktrace, so setting datadog specific attributes in a custom middleware would be doable. It might mean re-raising an exception I caught deeper down to make this happen, but its at least an option -- since error.stack is required an exception must be thrown somewhere as its the only way to get a stacktrace in Erlang anymore.

And if it is a separate process you can still get this through the reason and stacktrace on an EXIT signal.

Passing down the conn with the root SpanCtx as metadata might make it neater in some cases though, if possible.

tsloughter commented 1 month ago

Not sure if there was any satisfactory resolution here? Is there more to discuss? I've lost the context from my head but I believe exception and error on spans keeps coming up, maybe this can be closed and consolidated into another issue?