open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.75k stars 890 forks source link

Environment variables to provide K8s container and pod name #4140

Open iskiselev opened 4 months ago

iskiselev commented 4 months ago

What are you trying to achieve? There is work to implement container id detector for K8s environment for dotnet open-telemetry/opentelemetry-dotnet-contrib#1699 and js open-telemetry/opentelemetry-js-contrib#1962. To make possible resolution of container id, resource detector need to know container name and pod name. There is no standard way to pass that information into pods.

Can we define otel-specific environment variable to pass that information so that different language detectors may work the same? Suggested names either:

What did you expect to see? Environment variable name added to https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/configuration/sdk-environment-variables.md

Additional context. The container.id resource can be fetched using k8sattributeprocessor, but it requires to use collector and do additional configuration. Using k8sattributeprocessor is much more efficient if many pods need to resolve container.id. Implementation of logic in resource detector use http call to k8s service to resolve container.id (and proper rights should be granted). That approach is relatively slow and may conflict with requirement https://github.com/open-telemetry/opentelemetry-specification/blob/fd76183557effd39288018dcea198356d2334ae9/specification/resource/sdk.md?plain=1#L112-L113 Otel semantic convection already specified resource attribute for required data: k8s.pod.name and k8s.container.name. There is no well-defined way to use resource from other resource detectors - and we still don't know how they can be set. We can try hack with setting them through OTEL_RESOURCE_ATTRIBUTES environment variable, but after it the code would require duplicate standard mechanics of extracting resource from that environment variable and additional resource would be included in each trace, which may be not desirable. There is no way in language agent to filter out some resources from OTEL_RESOURCE_ATTRIBUTES (still can be achieved with collectors - but in that case k8sattributeprocessor would be better approach.)

danielgblanco commented 4 months ago

FYI there's a moratorium on adding any new env variables, see https://github.com/open-telemetry/opentelemetry-specification/issues/2891#issuecomment-1289241503

iskiselev commented 4 months ago

@danielgblanco, looks like we should not be affected by strong form of moratorium - as we are asking to add 2 environment variables with primitive string type with predefined name. I'm not sure if it should be blocked as "non-essential use-case".

dashpole commented 3 months ago

Having in-process resource detectors make k8s api calls seems like a poor UX. It requires additional k8s rbac permissions, and individual pod get requests can generate a lot of traffic for the k8s apiserver at scale (especially if workloads are crash-looping). I would recommend this behavior be opt-in if we do add it.

jsuereth commented 3 months ago

The current entities WG proposal for improving resource detection would include a generic mechanism for injecting a set of resource attributes via env variables such that you would NOT conflict with other resource detection.

See: https://docs.google.com/document/d/1QvkLQra8nPst740sSTwAhehMtLxP7q3XP4TBH-qrwEQ/edit

danielgblanco commented 3 months ago

I agree @dashpole and, in fact, this is the way we (widely) handle it at scale at Skyscanner where our deployment pipelines inject env vars that our OTel SDK config bundle then translates into resource attributes. With up to 100k pods running and high pod churn we try to avoid the K8s resource detector and favour env vars where we can. We still use the k8sattributeprocessor in Collectors, but only for certain cases where we need to get attributes from pod IPs pushing telemetry data. Looking forward to the proposal from the Entities WG!

jsuereth commented 3 months ago

Here's the official OTEP: https://github.com/open-telemetry/oteps/pull/264

jsuereth commented 3 months ago

To expand on my previous comment - The TC believes we need a feature to solve this issue but it may not take the form proposed here. This will be completed as part of the Entities WG. See the OTEP proposal above for how we think this will be solved.