open-telemetry / opentelemetry-java-contrib

https://opentelemetry.io
Apache License 2.0
144 stars 117 forks source link

Inferred spans #1340

Open JonasKunz opened 3 weeks ago

JonasKunz commented 3 weeks ago

Description:

This PR adds an OpenTelemetry SDK Extension (SpanProcessor) for enriching traces with "inferred" spans derived from profiling data. It internally uses async-profiler 3.0 in wall-clock profiling mode.

This feature was originally added a few years ago to the Elastic APM Agent, we recently ported it to OpenTelemetry in our OpenTelemetry distro and would now like to contribute it to the OpenTelemetry community.

The feature works by keeping track of which spans are activated/deactivated on which threads via the OpenTelemetry context API. A log of these activations/deactivations is spilled to disk, while at the same time async-profiler is enabled to profile threads with active spans. After the profiling session ends (default is 10 seconds), the profiling JFR and the log of span activations is used to generate the synthetic inferred spans. Here is also a blogpost about the feature.

Because this feature was initially written for the classic Elastic APM Agent, it heavily tries to minimize allocations. As a result, the feature is almost allocation free, which unfortunately comes at the cost of a bit more complex code due to pooling.

Testing:

The algorithm for span inference has been ported without changes from the classic Elastic APM Agent and therefore is relatively battle tested. The PR also comes with a large set of test-cases for edge cases of the inference algorithm.

Documentation:

Just the README.MD added on how to use this extension either by manual setup or using SDK auto configuration. It also documents some limitations of the inference algorithm.