open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.74k stars 888 forks source link

Environment Variables as Carrier for Inter-Process Propagation to transport context #740

Open mponnambalam opened 4 years ago

mponnambalam commented 4 years ago

Passing context information while a process being spawn as part of a span using a shell script or process builders.

Propagating context information while creating a process for another process, to trace the lineage, there isn't any protocol such as Http/gRPC to transport context. A common way to pass information using Environment Variables, which shall be utilized as a carrier.

Environment variables have differences between the operating systems, such as case sensitivity in Linux and case insensitivity in windows.

Defining specification/guidance, to have a common format and method for propagation using Environment Variables, will be solving many ETL tools and batch cases.

cyrille-leclerc commented 3 years ago

FYI an ecosystem of CI/CD platforms and tools is rapidly growing around propagation of Otel trace contezxt using the TRACEPARENT and TRACECONTEXT environment variables and we think we are mature enough to propose a standardisation of trace context propagation using environment variables.

Existing integrations of CI/CD tools using OpenTelemetry trace context propagation through environment variables (TRACEPARENT and TRACECONTEXT):

cc @tobert , @v1v, @kuisathaverat

Jenkins Maven pipeline

Jenkins otel-cli pipeline

Ansible pipeline

tobert commented 3 years ago

I am the author of https://github.com/equinix-labs/otel-cli and have been using TRACEPARENT propagation over envvars for a few months now. We're doing this work in production at Equinix Metal, and I believe a few others are using otel-cli in production now including GitHub. Please let me know how I can help make this OTEP happen.

Div95 commented 3 years ago

Hi, we have developed a propagator that helps propagate trace/baggage details across processes using environment variable as the carrier. It supports different formats for the trace details (b3/w3c/ some custom format).

I am author of https://github.com/Div95/opentelemetry-python/tree/feature/env_propagator/propagator/opentelemetry-propagator-env which has this solution in Python language. We are making use of 'os.environ' for carrier here.

We'd like to help move this issue forward.

aidansteele commented 3 years ago

@cyrille-leclerc you mention TRACECONTEXT but it looks like those tools actually use TRACESTATE - can you confirm if that's just a typo?

aidansteele commented 3 years ago

Thanks to @tonistiigi in https://github.com/rakyll/go-test-trace/issues/2 I have learned that Buildkit uses OTEL_TRACE_PARENT and OTEL_TRACE_STATE. It looks like this was inspired by opentelemetry-swift using these in it's EnvironmentContextPropagator. It was added in https://github.com/open-telemetry/opentelemetry-swift/pull/170.

tonistiigi commented 3 years ago

@aidansteele Indeed, these are exposed to every process running in Dockerfile RUN statement when tracing is enabled, as well as the env for otlp (forwarder) socket that is exposed to the container. Not a huge issue if we should need to switch to a different env or support both, but to my taste OTEL_ variables look less ambiguous and more in line with the env variables already defined by the spec.

rakyll commented 3 years ago

Traceparent and Tracestate from W3C Trace Context specification are not a part of OTel. W3C Trace Context is a standalone spec, Otel is one of the implementors. If we choose to prefix things with OTEL_, it'd not be great for other instrumentation libraries that want to provide propagation via environmental variables. They will eventually came up with their own and it will fragment the space. It'd be better if we don't tightly couple the environment variable name with OTel, but I agree that TRACEPARENT and TRACESTATE are not specific enough.

tonistiigi commented 3 years ago

cc @nachoBonafonte @SergeyKanzhelev (from the EnvironmentContextPropagator implementation)

tobert commented 3 years ago

otel-cli is using TRACEPARENT and I've put it into a lot of code and there is a lot to be said for keeping the short name in preference to more descriptive names. It doesn't matter much for otel-cli but a really common mod I've made is adding traceparent headers to curl calls in shell scripts. otel-cli might generate the span parent then we do:

curl -H "traceparent: ${TRACEPARENT:-}" $url

Maybe I'm too deep into this stuff and acclimated to the variable names, but this reads well and looks obvious in the code I've submitted to various projects. Making the variable names longer would make it a bit more awkward to inject this into existing code and the mismatched header/variable name feels wrong to me.

yurishkuro commented 3 years ago

but I agree that TRACEPARENT and TRACESTATE are not specific enough

They seem no more or less specific than the respective HTTP headers. @rakyll do you have some specific downsides in mind from using these as is, in accordance with the W3C spec for http headers?

Traceparent and Tracestate from W3C Trace Context specification are not a part of OTel. W3C Trace Context is a standalone spec, Otel is one of the implementors.

On the other hand, I am not aware of any standards body that can bless the name of an environment variable.

My vote is to use TRACEPARENT and TRACESTATE, and optionally provide a config capability in the propagator to override the names should some users come across naming conflicts.

cyrille-leclerc commented 3 years ago

Using TRACEPARENT and TRACESTATE felt natural when we built the Jenkins OpenTelemetry Plugin and the OpenTelemetry Maven Extension and it seamlessly integrated with @tobert's otel-cli.

The integration using TRACEPARENT and TRACESTATE was so simple and so intuitive that we realized the potential Otel has to provide observability on the whole CI/CD toolchain.

It would be great if we could just use those environment variables and, as @yurishkuro said, optionally support specifying alternate environment variables.

rakyll commented 3 years ago

They seem no more or less specific than the respective HTTP headers. @rakyll do you have some specific downsides in mind from using these as is, in accordance with the W3C spec for http headers?

Headers are scoped to a request, environmental variables are scoped to the environment and has a larger radius. That's where my worry is coming from.

We are using TRACEPARENT right now in go-test-trace, https://github.com/rakyll/go-test-trace/commit/22493612be320e0a01c174efe9b2252924f6dda9.

nachoBonafonte commented 3 years ago

cc @nachoBonafonte @SergeyKanzhelev (from the EnvironmentContextPropagator implementation)

Hi, in opentelemetry-swift we just needed the functionality and as there was no approved spec for it we just followed the naming guidelines for other environment variables. We thought that using a very generic name may clash with other tracing libraries or systems. Using something like W3C_TRACE_PARENT was also in the list of possible names, which is both specific and clarifies what it is about. Said that, we have no issues with changing the name to whatever is approved, and we will maintain compatibility for a time.

cyrille-leclerc commented 3 years ago

FYI Concourse CI also uses the TRACEPARENT environment variable for context propagation.

I have just submitted a PR on the Concourse CI documentation to make it clear: https://github.com/concourse/docs/pull/462

yurishkuro commented 3 years ago

It seems we have a general consensus on using TRACEPARENT/TRACESTATE unadorned. I'd say it's safe to move to creating a PR to reflect this in the spec.

v1v commented 3 years ago

The Opentelemetry callback plugin for Ansible also uses TRACEPARENT -> https://github.com/ansible-collections/community.general/pull/3378

deejgregor commented 2 years ago

Is anyone working on a PR for this change to the spec (or planning to work on one)? If not, I'm happy to work on a PR. I'm very new to OpenTelemetry, so if there is anyone who would like to volunteer to answer some questions (if needed) and do an initial review, that would be great for this newb.

wajika commented 2 years ago

FYI an ecosystem of CI/CD platforms and tools is rapidly growing around propagation of Otel trace contezxt using the TRACEPARENT and TRACECONTEXT environment variables and we think we are mature enough to propose a standardisation of trace context propagation using environment variables.

Existing integrations of CI/CD tools using OpenTelemetry trace context propagation through environment variables (TRACEPARENT and TRACECONTEXT):

cc @tobert , @v1v, @kuisathaverat

Jenkins Maven pipeline

Jenkins otel-cli pipeline

Ansible pipeline

Are there any relevant solutions for other compiled languages? We use dotnet and azure devops server CI pipeline.

cyrille-leclerc commented 2 years ago

Thanks @wajika

Are there any relevant solutions for other compiled languages? We use dotnet and azure devops server CI pipeline.

We see initiatives to instrument unit tests (using the JUnit report format) and progress on BuildKite (here) but unfortunately I'm not aware of progress on the .Net / Azure ecosystem yet.

wajika commented 2 years ago

Thanks @wajika

Are there any relevant solutions for other compiled languages? We use dotnet and azure devops server CI pipeline.

We see initiatives to instrument unit tests (using the JUnit report format) and progress on BuildKite (here) but unfortunately I'm not aware of progress on the .Net / Azure ecosystem yet.

I think shell can be used, but I didn't figure out how to generate spanid and transid, I think as long as there is spanid and transid, they can be written to each pipeline header.

pipeline{
    agent any
    stages {
        stage('Build') {
            steps{
                echo 'This is a build step'    >> >>> Generate a spanid and send it to the otel
            }
        }
        stage('Test') {
            steps{
                echo 'This is a test step'   >> >>> Generate a transid-1 and send it to the otel
            }
        }
        stage('Deploy') {
            steps{
                echo 'This is a deploy step'     >> >>> Generate a transid-2 and send it to the otel
            }
        }
    }
}

Finally we can see spanid >> transid-1 >> transid-2 on kibana apm UI. As long as it can be done through the shell, CICD monitoring can be done regardless of the language. Am I going wrong?

cyrille-leclerc commented 2 years ago

@wajika can we please switch to the jenkinsci-users@googlegroups.com mailing list?

wajika commented 2 years ago

@wajika can we please switch to the jenkinsci-users@googlegroups.com mailing list?

Okay, but I might not be able to follow the community in real time.

Can you teach me how to keep track of opentelemetry trends in the jenkinsci group?

Grunet commented 2 years ago

Is there somewhere amongst the OTEL spec docs that calls out which HTTP headers to look for this info in? I've only found it in the W3C spec for TraceContext and then mentioned as a line item in the language compatibility matrix so far.

If not, is updating a W3C spec the right direction to look further into for this? I was thinking this one eventually linked out from the main spec page maybe, which is already discussing what to do for non-HTTP protocols like MQTT (though I guess there isn't really a "protocol" per se in this case)

stmlange commented 1 year ago

Hello, thanks for starting this discussion. Maybe it's just me, but I'm currently observing a discrepancy in the casing of how those environment variables are referenced.

For context I'm just using a W3CTraceContextPropagator (W3CTraceContext documentaion).

The Jenkins OpenTelemetry Plugin seems to provide the TRACECONTEXT / TRACESTATE variable (upper cased).

When I inject the current context into a map like so:

private static final TextMapSetter<Map<String, String>> TEXT_MAP_SETTER = Map::put;
....
final Context context = Context.current();
final Map<String, String> contextMap = new HashMap<>();
        GlobalOpenTelemetry.get()
                .getPropagators()
                .getTextMapPropagator()
                .inject(context, contextMap, TEXT_MAP_SETTER);

it results in a tracecontext / tracestate variable (lower cased).

Since environment variables are case sensitive and the W3CTraceContext documentaion refers to them as only lower cased: Should this be considered a bug in the Jenkins OpenTelemetry Plugin, or is perhaps worth to discuss in this more broader context, since it can affect any other Inter-Process Carrier Propagation?

trask commented 1 year ago

hi @stmlange, my understanding is that W3CTraceContext is only about propagating over HTTP headers

my suggestion in your code above would be to have your TextMapSetter handle the upper casing when propagating over env vars

deejgregor commented 1 year ago

@yurishkuro said awhile back:

It seems we have a general consensus on using TRACEPARENT/TRACESTATE unadorned. I'd say it's safe to move to creating a PR to reflect this in the spec.

I'm curious if this would count as a "Yes" per this:

Follow the issue workflow and make sure the issue is accepted with a "Yes" response. If the response to the issue is not "Yes" then do not create a PR that implements the change since it will be rejected.

-- source

If not, what is the right path to get to a Yes (or No, etc.) for this issue?

yurishkuro commented 1 year ago

@deejgregor yes, I believe we have the agreement, and judging by the number of different PRs linked to this issue there are plenty of implementations that already use this approach. Are you willing to create an OTEP PR? In it I would cover

deejgregor commented 1 year ago

@yurishkuro:

Are you willing to create an OTEP PR?

Yup! Finally... after first aiming to do this in late 2021, heh. I've started researching, and I'll make sure the items you mentioned are included.

I've noticed one interesting thing so far: we aren't terribly explicit in the spec about how we propagate other than mentioning the propagation distributions, which mostly cover how to encode in HTTP (request) headers and not much else (in the case of B3, it does mention what to do for gRPC and JMS) . The trace semantic conventions for messaging have probably the most explicit mention of how to attach the context information, if only to say it isn't explicitly covered yet. ;-)

This document does not specify the exact mechanisms on how the creation context is attached/extracted to/from messages. Future versions of these conventions will give clear recommendations, following industry standards including, but not limited to Trace Context: AMQP protocol and Trace Context: MQTT protocol once those standards reach a stable state.

-- source

krzko commented 11 months ago

I've also added TRACEPARENT injection to the shell in https://github.com/krzko/run-with-telemetry, which compliments the New component: Github Actions Event Receiver issue.

mponnambalam commented 11 months ago

Really glad to see this getting traction. Appreciate it. Thanks.

On Wed, Nov 15, 2023, 8:40 PM Kristof Kowalski @.***> wrote:

I've also added TRACEPARENT injection to the shell in https://github.com/krzko/run-with-telemetry, which compliments the New component: Github Actions Event Receiver https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/27460 issue.

— Reply to this email directly, view it on GitHub https://github.com/open-telemetry/opentelemetry-specification/issues/740#issuecomment-1813741356, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPDLPTCUTK2LU6Q5BXMLODYEWDK3AVCNFSM4PHO6QTKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBRGM3TIMJTGU3A . You are receiving this because you authored the thread.Message ID: @.*** com>

MSNev commented 9 months ago

Seems to be related: https://github.com/open-telemetry/opentelemetry-specification/pull/3825, if you consider inter-process to be a "client"

danielgblanco commented 5 months ago

This is related to https://github.com/open-telemetry/oteps/pull/258. @adrielp just making you aware as you raised the OTEP PR

danielgblanco commented 3 months ago

@carlosalberto @yurishkuro as I've seen you commented on the OTEP PR linked above, are any of you sponsoring this?