open-telemetry / opentelemetry-cpp-contrib

https://opentelemetry.io/
Apache License 2.0
125 stars 140 forks source link

NGINX: Allow overriding `opentelemetry_trace_id` from within the config #219

Open NaurisSadovskis opened 2 years ago

NaurisSadovskis commented 2 years ago

Is your feature request related to a problem? We use Google Load Balancers which add a vendor specific header (X-Cloud-Trace-Context) to our requests. It seems compatible with W3C format, but OTEL library does not detect it as it's not supported and creates a new span which gets passed down the line (together with Google header), resulting in two separate traces which are not connected.

To solve it, we tried is parsing the X-Cloud-Trace-Context header within NGINX and extracting trace ID and using that to construct a new header which is passed down the line:

http {
      # omitted: some parsing code to $trace_id from X-Cloud-Trace-Context header
      server {
          opentelemetry_propagate;
          opentelemetry_trust_incoming_spans on; 
      location / { 
          proxy_pass http://some_endpoint;

          # Construct a custom header using Google's trace ID and OTEL's generated Span ID
          set $custom_traceparent "00-${trace_id}-${opentelemetry_span_id}-01";

          # Set header
          proxy_set_header traceparent $custom_traceparent;  
        }
    } 
}

This solution passes a valid trace down the line, however it also results in two traces due to OTEL library generating it's own opentelemetry_trace_id which it then sends to the OLTP exporter host. It's not entirely clear to me at what point in the NGINX request flow OTEL generates its trace ID and I'm wondering if there is a way to override it. This can be visualised as follows:

graph TD
    A(Request) -->B(Google Load Balancer)
    B --> |Google adds: X-Cloud-Trace-Context|C(NGiNX)
    C -->|Custom traceparent - Trace A| D(Application)
    C -->|Default/Internal traceparent - Trace B| E(OLTP exporter host)

Describe the solution you'd like

Thanks and I hope you can provide some additional context how to best solve it!

svrnm commented 2 years ago

Have you tried the otel-webserver-module for nginx as an alternative?

https://github.com/open-telemetry/opentelemetry-cpp-contrib/tree/main/instrumentation/otel-webserver-module

NaurisSadovskis commented 2 years ago

I have not, but it seems it also does not support overriding incoming headers to construct new ones ?

svrnm commented 2 years ago

@DebajitDas , @kpratyus : can you comment on that?

kpratyus commented 2 years ago

If the key name of Tracecontext coming from Google Load Balancer is same as the opentelemetry specification than the tracecontext will be overridden. Is the key name "X-Cloud-Trace-Context" because what otel is using is different?

NaurisSadovskis commented 2 years ago

@kpratyus exactly - Google Load Balancers do not use traceparent header and in our chain the conversion needs to happen in NGINX so we also can instrument the entire flow. Currently our NGINX spans are missing as we're propagating Google header down the line since Python/etc has support for it.

After the initial post, I've tried using openresty/headers-more-nginx-module to see if I can override input headers and pre-process trace before it hits OTEL module, but due to possibly how modules are invoked, I couldn't to get it to work.

Side-note: Google might support traceparent eventually since some of their serverless products already do, but there are no public dates for it.

kpratyus commented 2 years ago

@NaurisSadovskis The otel specification says that "traceparent" key should be used for context propogation. So unless there is a change in otel specification the agents are forced to use "traceparent" as key.

svrnm commented 2 years ago

I am not 100% sure how C++ does it in opentelemetry, but I think what is needed first, is general support for a cloud trace propagators like other languages have it, then ideally there should be a composite propagator that allows combination and then nginx module could support them

svrnm commented 2 years ago

For reference, right now there is b3, trace context & jaeger for C++:

https://github.com/open-telemetry/opentelemetry-cpp/tree/703576c0b8e59d321d544cabde1cff89b92c8436/api/include/opentelemetry/trace/propagation

NaurisSadovskis commented 2 years ago

I should note that Google's format is compatible with W3C (if this was not immediately clear). Namely:

# Google: X-Cloud-Trace-Context
05d7c045f9cd3c474f9e6b733823fb44/14127515246394157611;o=1
  ^                                         ^                    
  |                                         | 
trace id (32 chars)                      span id
# W3C: traceparent
00-0af7651916cd43dd8448eb211c80319c-b9c7c989f97918e1-01
    ^                                         ^                    
    |                                         | 
  trace id (32 chars)                      span id

Thus, if we're able to split the incoming Google header and override the opentelemetry_span_id variable set by NGINX module, we would not need to make custom propagators for a standard that might or might not be there in few years.

kpratyus commented 2 years ago

I should note that Google's format is compatible with W3C (if this was not immediately clear). Namely:

# Google: X-Cloud-Trace-Context
05d7c045f9cd3c474f9e6b733823fb44/14127515246394157611;o=1
  ^                                         ^                    
  |                                         | 
trace id (32 chars)                      span id
# W3C: traceparent
00-0af7651916cd43dd8448eb211c80319c-b9c7c989f97918e1-01
    ^                                         ^                    
    |                                         | 
  trace id (32 chars)                      span id

Thus, if we're able to split the incoming Google header and override the opentelemetry_span_id variable set by NGINX module, we would not need to make custom propagators for a standard that might or might not be there in few years.

But OTEL agents can only handle "traceparent" as the key not "X-Cloud-Trace-Context"

NaurisSadovskis commented 2 years ago

Hi @kpratyus, just to update this thread - we have found a temporary workaround by using and modifying headers-more module (see below) to run at an earlier stage and build up a valid traceparent from the incoming X-Cloud-Trace-Context header. This causes this OTEL library to correctly pick up the header and propagate it.

We have also opened a feature request on Google's issue tracker: #253419736 Set traceparent in addition to x-cloud-trace-context.

diff --git a/src/ngx_http_headers_more_filter_module.c b/src/ngx_http_headers_more_filter_module.c
index 0bb6fec..a230f1b 100644
--- a/src/ngx_http_headers_more_filter_module.c
+++ b/src/ngx_http_headers_more_filter_module.c
@@ -244,7 +244,7 @@ ngx_http_headers_more_post_config(ngx_conf_t *cf)

     cmcf = ngx_http_conf_get_module_main_conf(cf, ngx_http_core_module);

-    h = ngx_array_push(&cmcf->phases[NGX_HTTP_REWRITE_PHASE].handlers);
+    h = ngx_array_push(&cmcf->phases[NGX_HTTP_SERVER_REWRITE_PHASE].handlers);
     if (h == NULL) {
         return NGX_ERROR;
     }
GitKaran commented 3 months ago

Hi @NaurisSadovskis, We are facing the same issue and not able to propagate X-Cloud-Trace-Context received from gcp lb to nginx as traceparent header. Could you please share more details on how did you work around it?

marcalff commented 3 months ago

Just finding this issue today.

When using custom tags in trace propagation, the proper way is to implement a custom propagator, which is aware of the tags used.

See the following C++ APIs:

See examples in sub classes, like:

See the global propagator singleton, which is designed to handle exactly this case: