hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io
Other
42.76k stars 9.56k forks source link

Support TRACEPARENT for Opentelemetry #35444

Open moserke opened 4 months ago

moserke commented 4 months ago

Terraform Version

1.8.5

Use Cases

When terraform is called as part of an orchestrated pipeline, it's great to have the ability to turn on telemetry with the OTEL_TRACES_EXPORTER=otlp variable, but many times the pipeline itself is instrumented and there is already a trace in progress. Given that pipelines shell out to run terraform, there would need to be a way to pass in the trace context through an environment variable. Most tooling is landing on TRACEPARENT as that variable which carries a valid W3C Trace Context. This would allow us to tie the terraform spans into the pipeline trace.

Attempted Solutions

Reading https://github.com/hashicorp/terraform/blob/main/telemetry.go#L86 it is clear that an empty context is initiated, so there is no possible at this time.

Proposal

In https://github.com/hashicorp/terraform/blob/main/telemetry.go#L56 the TRACEPARENT environment variable should be checked for, and if found, then the trace context propagator should have the TRACEPARENT extracted into it. If the environment variable does not exist, then leave an empty TraceContext.

References

No response

apparentlymart commented 4 months ago

Hi @moserke! Thanks for this suggestion.

I think it would be great to support something like this. One thing we should probably watch out for is that the telemetry mechanism is also active in the (currently experimental) terraform rpcapi but in that case expects the trace parent metadata to arrive through gRPC metadata passed by the client calling the API. We should make sure not to break that if we implement this, because connecting RPC API requests with the trace of the RPC client is more relevant than connecting with whatever launched the terraform rpcapi process.

One concern I have is that the conventions around OpenTelemetry have a tendency to churn a bunch before standardization and since OTel is not an area of current focus for the Terraform team I wouldn't want to sign up for chasing a not-yet-standard while the community iterates. I expect we'd probably prefer to wait until a specific convention is codified in an OpenTelemetry specification, rather than assuming that a current de-facto standard will remain the recommendation.

Do you know if the OpenTelemetry community is already planning to publish a specification for this convention? If so, it would be great to have a link to that discussion from this issue so that we can follow the progress.

moserke commented 4 months ago

There is an open PR right now to bring this into the specification: https://github.com/open-telemetry/oteps/pull/258. It is a follow on from the larger discussion at https://github.com/open-telemetry/opentelemetry-specification/issues/740. It seems like they are landing on TRACEPARENT, TRACESTATE, & BAGGAGE. What might be interesting to see how this shakes out is if these become core SDK supported values, meaning, if this gets moved into the core SDK of the languages, then you might not have to do the "check if it's there" logic yourself but just use, what I would assume, is a New() function on the propagator. But I don't know that for sure....

In the mean time though you could extract it from the environment yourself, once that is merged as part of the specification....