kubernetes / ingress-nginx

Ingress NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.53k stars 8.26k forks source link

OpenTelemetry: Need to emit two spans (server and client) #11002

Open dkrizic opened 8 months ago

dkrizic commented 8 months ago

What happened:

I am using OpenTelemetry with Ingress-Nginx. This works so far and looks like this in Grafana/Tempo:

image

The problem is that the Metrics-Generator of Grafana does not work correctly. The Service Graph looks like this:

image

(ignore the component yasm-proxy-odbc). The Problem is, that there is no direct connection from ingress-nginx to my backend yasm-backend.

What you expected to happen:

According to the Otel Spec the Ingress-Nginx is both a SERVER and a CLIENT to the backend. Therefore it should emit two spans:

$ kubectl -n ingress-nginx exec -ti deploy/ingress-nginx-controller -- /nginx-ingress-controller --version                                                            
Defaulted container "controller" out of: controller, opentelemetry (init)
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.9.6
  Build:         6a73aa3b05040a97ef8213675a16142a9c95952a
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6
-------------------------------------------------------------------------------
$ kubectl version
Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.7

Environment:

$ helm ls -A | grep ingress
ingress-nginx                           ingress-nginx   20          2024-02-08 01:44:43.772125817 +0000 UTC deployed    ingress-nginx-4.9.1             1.9.6

How to reproduce this issue:

$ kubectl -n ingress-nginx get configmaps ingress-nginx-controller -o yaml
apiVersion: v1
data:
  allow-snippet-annotations: "false"
  enable-opentelemetry: "true"
  log-format-escape-json: "true"
  log-format-upstream: '{"time": "$time_iso8601", "remote_addr": "$proxy_protocol_addr",
    "x_forwarded_for": "$proxy_add_x_forwarded_for", "request_id": "$req_id", "remote_user":
    "$remote_user", "bytes_sent": $bytes_sent, "request_time": $request_time, "status":
    $status, "vhost": "$host", "request_proto": "$server_protocol", "path": "$uri",
    "request_query": "$args", "request_length": $request_length, "duration": $request_time,"method":
    "$request_method", "http_referrer": "$http_referer", "http_user_agent": "$http_user_agent"
    }'
  opentelemetry-operation-name: HTTP $request_method $service_name $uri
  opentelemetry-trust-incoming-span: "true"
  otel-max-export-batch-size: "2048"
  otel-max-queuesize: "2048"
  otel-sampler: AlwaysOn
  otel-sampler-ratio: "1.0"
  otel-schedule-delay-millis: "1000"
  otel-service-name: ingress-nginx
  otlp-collector-host: opentelemetry-collector.observability
  otlp-collector-port: "4317"
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: ingress-nginx
    meta.helm.sh/release-namespace: ingress-nginx
  creationTimestamp: "2023-03-10T09:17:30Z"
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.9.6
    helm.sh/chart: ingress-nginx-4.9.1
  name: ingress-nginx-controller
  namespace: ingress-nginx
  resourceVersion: "136281139"
  uid: 18425030-a7e1-4245-b7fa-bcaa87878b7d

Access the Ingress from outside and observe the traces. You will see one span coming from the ingress-controller. It is of type server:

image

It is of type server. The next trace is of type server as well and this leads to the problem that tracing backends don't handle that correctly. There should be a second span of type client.

k8s-ci-robot commented 8 months ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
longwuyuan commented 8 months ago

/assign @esigo

esigo commented 8 months ago

is your backend instrumented?

dkrizic commented 8 months ago

@esigo Yes it is, as you can see in the screenshot above, it is called "yasm-backend". I have

Frontend (client) -> Ingress-Nginx (server) -> Backend (server), so what I need is something like this Frontend (client) -> ingress-nginx (span 1, server) -> Ingress-nginx (span2, client) -> Backend (server)

dkrizic commented 8 months ago

BTW: The Azure Monitor App Insights also needs those two spans. As you can see, the chain

is interrupted.

image
github-actions[bot] commented 7 months ago

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

c0deaddict commented 1 month ago

I'm also running into this problem, had to disable the traces nginx to make sense of the service graph.

dkrizic commented 1 month ago

I just saw, that traefik does it correctly like in this sample:

The upper one is of type server, the lower one of type span.

dkrizic commented 1 month ago

I'm also running into this problem, had to disable the traces nginx to make sense of the service graph.

Disabling ingress-nginx in OpenTelemetry brings the problem, that now there is not instance that add span attributes like http.status. Now each micro service needs to do this by its own and this is exactly what I try to avoid.