ricsanfre / pi-cluster

Pi Kubernetes Cluster. Homelab kubernetes cluster automated with Ansible and ArgoCD
https://picluster.ricsanfre.com
MIT License
376 stars 59 forks source link

Nginx 1.10 does not support OpenTracing. Nginx tracing need to be migrated to OpenTelemetry #329

Closed ricsanfre closed 5 months ago

ricsanfre commented 7 months ago

Issue description

nginx ingress controller v1.10 breaks current Opentracing configuration:

nginx controller does not start with the following message:

-------------------------------------------------------------------------------
  Warning  RELOAD  117s  nginx-ingress-controller  Error reloading NGINX: 
-------------------------------------------------------------------------------
Error: exit status 1
2024/03/02 09:56:48 [emerg] 27#27: unknown "opentracing_context_x_b3_traceid" variable
nginx: [emerg] unknown "opentracing_context_x_b3_traceid" variable
nginx: configuration file /tmp/nginx/nginx-cfg3140476767 test failed

According to 1.10 release notes

ricsanfre commented 5 months ago

Testing nginx OpenTelemetry configuration

The following helm values can be used to activate nginx OpenTelemetry configuration

controller:
  podAnnotations:
    linkerd.io/inject: enabled

  # Allow snpippet anotations
  # From v1.9 default value has chaged to false.
  # allow-snippet-annotations: Enables Ingress to parse and add -snippet annotations/directives created by the user.
  # linkerd-viz ingress uses this annotations
  allowSnippetAnnotations: true

  config:
    # Open Telemetry
    enable-opentelemetry: "true"
    otlp-collector-host: tempo-distributor.tempo.svc.cluster.local
    otlp-service-name: nginx-internal
    # Print access log to file instead of stdout
    # Separating acces logs from the rest
    access-log-path: "/data/access.log"
    log-format-escape-json: "true"
    log-format-upstream: '{"source": "nginx", "time": $msec, "resp_body_size": $body_bytes_sent, "request_host": "$http_host", "request_address": "$remote_addr", "request_length": $request_length, "request_method": "$request_method", "uri": "$request_uri", "status": $status,  "user_agent": "$http_user_agent", "resp_time": $request_time, "upstream_addr": "$upstream_addr", "trace_id": "$opentelemetry_trace_id", "span_id": "$opentelemetry_span_id"}'
  # controller extra Volume
  extraVolumeMounts:
    - name: data
      mountPath: /data
  extraVolumes:
    - name: data
      emptyDir: {}
  extraContainers:
    - name: stream-accesslog
      image: busybox
      args:
      - /bin/sh
      - -c
      - tail -n+1 -F /data/access.log
      imagePullPolicy: Always
      resources: {}
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /data
        name: data

With this configuration access log entries contain trace_id and span_id.

{"source": "nginx", "time": 1713945025.521, "resp_body_size": 560, "request_host": "emojivoto", "request_address": "10.42.1.11", "request_length": 512, "request_method": "GET", "uri": "/", "status": 200,  "user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36", "resp_time": 0.009, "upstream_addr": "10.42.0.12:8080", "trace_id": "0739b6b5dac6d6bae83b0d9534eeed19", "span_id": "2a495451e6da1118"}
{"source": "nginx", "time": 1713945025.620, "resp_body_size": 1785688, "request_host": "emojivoto", "request_address": "10.42.1.11", "request_length": 354, "request_method": "GET", "uri": "/js", "status": 200,  "user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36", "resp_time": 0.080, "upstream_addr": "10.42.0.12:8080", "trace_id": "1f9bd4362d7207de85e9b13f191f2bc9", "span_id": "339547188e45c00d"}

With this changes the e2e test using linkerd, and emojivoto test application does not work.

E2E Trace context propagation is not working (nginx -> linkerd -> emojivoto application):

image

Propagation is only working between nginx and linkerd.

  1. Ingress Nginx only supports w3c context propagation. B3 context propagation is not supported. See open issue https://github.com/kubernetes/ingress-nginx/issues/10324. B3 context propagation was used by emojivoto application and linkerd.

  2. linkerd included support to w3c context in release 2.13. See https://github.com/linkerd/linkerd2/issues/5416. So if linkerd receives w3c headers it will be use that in the context.

  3. Default emojivoto application is using opencensus and not opentelemety. It needs to be updated to support it. linkerd's developer for testing w3c propagation feature refactored emojivoto app to support opentelemetry. See https://github.com/linkerd/linkerd2-proxy/pull/2179#issuecomment-1402266875. That changes never were pushed to the main branch of emojivoto app.

ricsanfre commented 5 months ago

Pull Request #389 will fixed the issue, enabling the upgrade to 1.10.

New issue created to replace emojivoto application by other application supporting opentelemetry: #390