open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
220 stars 141 forks source link

recordException should support chained exceptions #941

Open dvoytenko opened 8 months ago

dvoytenko commented 8 months ago

What are you trying to achieve?

Many languages support a concept of a chained exception. For instance, JavaScript has Error.cause. Java also has Throwable.getCause(). When present, is just as important (and often more so), than the wrapping exception. Currently recordException() loses this information.

It'd be valuable to clarify whether the relevant languages should support the cause and how.

trask commented 8 months ago

fwiw, OpenTelemetry Java SDK implementation of recordException uses Throwable.printStackTrace() which does include the full causal chain

dvoytenko commented 8 months ago

It might be good to clarify a construction of the unified stack trace in other languages, since not all of them do this.

However, I also think that combining stack traces is not enough. The "cause" exceptions often have valuable metadata, such as HTTP status codes and other attributes. Ideally there'd be a way to preserve it all.

MrAlias commented 8 months ago

Please provide a comprehensive breakdown of what this would mean for all the languages OTel is implemented in. It is not obvious how this would apply to all languages, especially non-object-oriented languages or those that do not use exceptions as control flow.

dvoytenko commented 8 months ago

In JavaScript, the errors can be chained and accessed using error.cause. If someone needs the full stack, they are expected to interrogate error.cause recursively - it's not an automatic behavior. We could document for an appropriate OTEL library to combine all stacks (see https://github.com/open-telemetry/opentelemetry-js/issues/4227). However, the issue might be deeper, since the errors also contain important metadata, including code, name, and other fields/attributes.

The data model for this could be presented as:

Error {
  exception.type: "",
  exception.message: "",
  exception.stacktrace: "",
  exception.cause: {
    exception.type: "",
    exception.message: "",
    exception.stacktrace: "",
    exception.cause: { ... }
  }
}

However, it's not clear how to codify this in OTEL data model:

  1. Attributes do not allow nested sub-structures. Only primitives and arrays-of-primitves are allowed. This could be extended (https://github.com/open-telemetry/opentelemetry-specification/issues/376) and may already be possible in LogRecord (https://github.com/open-telemetry/opentelemetry-specification/issues/622).
  2. "Error event" data model could be extended to allow sub-events.
  3. A library may already chose to add each error in a chain in a flat "error event" list. However, this would lose the parent/child relationships between errors.