open-telemetry / oteps

OpenTelemetry Enhancement Proposals
https://opentelemetry.io
Apache License 2.0
326 stars 157 forks source link

Inter-process context propagation requirements #241

Open deejgregor opened 7 months ago

deejgregor commented 7 months ago

https://github.com/open-telemetry/opentelemetry-specification/issues/740

mmanciop commented 5 months ago

I'd love to have this supported in OpenTelemetry SDKs. I think I implemented something like that manually in the last couple years at least 5 times across various customers and various tools. (Classic use case: trace context propagation for K8S cronjobs/jobs and job scheduling on ECS, rather than using AWS Batch.)

I think we should have a mechanism to specify in the env var whether the instrumentations would continue the trace, or start a new trace and create a span link to the trace context in the env var.

As an addition, I think a very useful behaviour would be for the SDK to have an instrumentation that is activated by this env var, and creates a span describing the process's startup. This, in my experience, is usually the desired behaviour for instrumenting entry points of batch jobs, and this "entry span" tends to be very tedious to create with manual instrumentation and tends not to have interesting metadata (exceptions are manual instrumentations that report the Job ID, but I came across very few such examples over the years in customer codebases). The "startup span" comes with a challenge: when to close it. In the past I used a mix of listeners for process shutdown + flush (e.g., JVM Shutdown Hooks or Python's atexit), which overall has a greater risk of losing spans over the process shutdown than closing the span early (and flushing?), but has a more intuitive semantic for the end user (and the additional benefit of "measuring" the lifetime of the batch job via span duration).