grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.46k stars 219 forks source link

k8sattributes Returns information from Alloy, not from originating pod #1336

Open jseiser opened 4 months ago

jseiser commented 4 months ago

What's wrong?

When enabling k8sattributes on grafana alloy running in EKS, you end up getting information from Alloy, not from the originating pod.

So you end up with worthless attributes. Not the log at the end, is from an nginx ingress pod, in the name space nginx-ingress-internal, but all the attributes are for a grafana alloy pod.

❯ kubectl describe pod/ingress-nginx-controller-76fb6f965b-6k6hm -n nginx-ingress-internal                                                                                                                                                    
Name:             ingress-nginx-controller-76fb6f965b-6k6hm                                                                                                                                                                                   
Namespace:        nginx-ingress-internal                                                                                                                                                                                                      
Priority:         0                                                                                                                                                                                                                           
Service Account:  ingress-nginx                                                                                                                                                                                                               
Node:             i-0e5f7e0fccbd428f5.us-gov-west-1.compute.internal/10.2.29.50                                                                                                                                                               
Start Time:       Fri, 19 Jul 2024 12:49:59 -0400                                                                                                                                                                                             
Labels:           app.kubernetes.io/component=controller                                                                                                                                                                                      
                  app.kubernetes.io/instance=ingress-nginx                                                                                                                                                                                    
                  app.kubernetes.io/managed-by=Helm                                                                                                                                                                                           
                  app.kubernetes.io/name=ingress-nginx                                                                                                                                                                                        
                  app.kubernetes.io/part-of=ingress-nginx                                                                                                                                                                                     
                  app.kubernetes.io/version=1.11.1                                                                                                                                                                                            
                  helm.sh/chart=ingress-nginx-4.11.1                                                                                                                                                                                          
                  linkerd.io/control-plane-ns=linkerd                                                                                                                                                                                         
                  linkerd.io/proxy-deployment=ingress-nginx-controller                                                                                                                                                                        
                  linkerd.io/workload-ns=nginx-ingress-internal                                                                                                                                                                               
                  pod-template-hash=76fb6f965b                                                                                                                                                                                                
Annotations:      jaeger.linkerd.io/tracing-enabled: true                                                                                                                                                                                     
                  linkerd.io/created-by: linkerd/proxy-injector edge-24.7.3                                                                                                                                                                   
                  linkerd.io/inject: enabled                                                                                                                                                                                                  
                  linkerd.io/proxy-version: edge-24.7.3
                  linkerd.io/trust-root-sha256: 35504e48329c1792791907e06a50bbfe8a1dc2bc0217233d68eee3eb08bed27a
                  viz.linkerd.io/tap-enabled: true
Status:           Running
IP:               10.2.26.198

You can see the ip for the pod is correct in the trace below, but nothing else./

e.g.

          {
            "key": "linkerd.io.proxy-daemonset",
            "value": {
              "stringValue": "alloy"
            }
          },
          {
            "key": "service.name",
            "value": {
              "stringValue": "nginx"
            }
          },
          {
            "key": "k8s.namespace.name",
            "value": {
              "stringValue": "grafana-alloy"
            }
          },
          {
            "key": "k8s.pod.name",
            "value": {
              "stringValue": "alloy-sp77n"
            }
          }

Steps to reproduce

  1. Deploy EKS
  2. Deploy Alloy

System information

No response

Software version

v1.2.1

Configuration

alloy:
  mode: "flow"
  configMap:
    create: true
    content: |-
      logging {
        level  = "info"
        format = "json"
      }

      otelcol.exporter.otlp "to_tempo" {
        client {
          endpoint = "tempo-distributed-distributor.tempo.svc.cluster.local:4317"
          tls {
              insecure             = true
              insecure_skip_verify = true
          }
        }
      }

      otelcol.receiver.otlp "default" {
        debug_metrics {
          disable_high_cardinality_metrics = true
        }
        grpc {
          endpoint = "0.0.0.0:4317"
          include_metadata = true
        }

        http {
          endpoint = "0.0.0.0:4318"
          include_metadata = true
        }
        output {
          traces = [otelcol.processor.resourcedetection.default.input]
        }
      }

      otelcol.receiver.opencensus "default" {
        debug_metrics {
          disable_high_cardinality_metrics = true
        }
        endpoint  = "0.0.0.0:55678"
        transport = "tcp"
        output {
          traces = [otelcol.processor.resourcedetection.default.input]
        }
      }

      otelcol.processor.resourcedetection "default" {
        detectors = ["env", "eks"]

        output {
          traces = [otelcol.processor.k8sattributes.default.input]
        }
      }

      otelcol.processor.k8sattributes "default" {
        extract {
          annotation {
            from      = "pod"
            key_regex = "(.*)/(.*)"
            tag_name  = "$1.$2"
          }
          label {
            from      = "pod"
            key_regex = "(.*)/(.*)"
            tag_name  = "$1.$2"
          }

          metadata = [
            "k8s.namespace.name",
            "k8s.deployment.name",
            "k8s.statefulset.name",
            "k8s.daemonset.name",
            "k8s.cronjob.name",
            "k8s.job.name",
            "k8s.node.name",
            "k8s.pod.name",
            "k8s.pod.uid",
            "k8s.pod.start_time",
          ]
        }

        output {
          traces  = [otelcol.processor.memory_limiter.default.input]
        }
      }

      otelcol.processor.memory_limiter "default" {
        check_interval = "5s"

        limit = "512MiB"

        output {
            traces  = [otelcol.processor.tail_sampling.default.input]
        }
      }

      otelcol.processor.tail_sampling "default" {
        policy {
          name = "ignore-health"
          type = "string_attribute"

          string_attribute {
            key                    = "http.url"
            values                 = ["/health", "/metrics", "/healthz", "/loki/api/v1/push"]
            enabled_regex_matching = true
            invert_match           = true
          }
        }

        policy {
          name = "ignore-health-target"
          type = "string_attribute"

          string_attribute {
            key                    = "http.target"
            values                 = ["/health", "/metrics", "/healthz", "/loki/api/v1/push"]
            enabled_regex_matching = true
            invert_match           = true
          }
        }

        policy {
          name = "ignore-health-path"
          type = "string_attribute"

          string_attribute {
            key                    = "http.path"
            values                 = ["/health", "/metrics", "/healthz", "/loki/api/v1/push"]
            enabled_regex_matching = true
            invert_match           = true
          }
        }

        policy {
          name = "all-errors"
          type = "status_code"

          status_code {
            status_codes = ["ERROR"]
          }
        }

        policy {
          name = "sample-percent"
          type = "probabilistic"

          probabilistic {
            sampling_percentage = 50
          }
        }

        output {
          traces =  [otelcol.processor.batch.default.input]
        }
      }

      otelcol.processor.batch "default" {
        send_batch_size = 16384
        send_batch_max_size = 0
        timeout = "2s"

        output {
            traces  = [otelcol.exporter.otlp.to_tempo.input]
        }
      }

  enableReporting: false
  extraPorts:
    - name: otlp-grpc
      port: 4317
      targetPort: 4317
      protocol: TCP
    - name: otlp-http
      port: 4318
      targetPort: 4318
      protocol: TCP
    - name: opencensus
      port: 55678
      targetPort: 55678
      protocol: TCP

controller:
  priorityClassName: "system-cluster-critical"
  tolerations:
    - operator: Exists

serviceMonitor:
  enabled: true
  additionalLabels:
    release: kube-prometheus-stack

ingress:
  enabled: true
  ingressClassName: "nginx-internal"
  annotations:
    nginx.ingress.kubernetes.io/service-upstream: "true"
    cert-manager.io/cluster-issuer: cert-manager-r53-qa
  labels:
    ingress: externaldns
  path: /
  pathType: Prefix
  hosts:
    - faro-${cluster_number}-${environment}.${base_domain}
  tls:
    - secretName: faro-${cluster_number}-${environment}.${base_domain}-tls
      hosts:
        - faro-${cluster_number}-${environment}.${base_domain}

Logs

{
  "batches": [
    {
      "resource": {
        "attributes": [
          {
            "key": "telemetry.sdk.version",
            "value": {
              "stringValue": "1.11.0"
            }
          },
          {
            "key": "telemetry.sdk.name",
            "value": {
              "stringValue": "opentelemetry"
            }
          },
          {
            "key": "telemetry.sdk.language",
            "value": {
              "stringValue": "cpp"
            }
          },
          {
            "key": "cloud.provider",
            "value": {
              "stringValue": "aws"
            }
          },
          {
            "key": "cloud.platform",
            "value": {
              "stringValue": "aws_eks"
            }
          },
          {
            "key": "k8s.pod.ip",
            "value": {
              "stringValue": "10.2.29.64"
            }
          },
          {
            "key": "linkerd.io.workload-ns",
            "value": {
              "stringValue": "grafana-alloy"
            }
          },
          {
            "key": "linkerd.io.inject",
            "value": {
              "stringValue": "enabled"
            }
          },
          {
            "key": "k8s.pod.uid",
            "value": {
              "stringValue": "021b9b57-49c7-4453-bcd6-099ea9ed6c05"
            }
          },
          {
            "key": "app.kubernetes.io.instance",
            "value": {
              "stringValue": "alloy"
            }
          },
          {
            "key": "linkerd.io.trust-root-sha256",
            "value": {
              "stringValue": "35504e48329c1792791907e06a50bbfe8a1dc2bc0217233d68eee3eb08bed27a"
            }
          },
          {
            "key": "viz.linkerd.io.tap-enabled",
            "value": {
              "stringValue": "true"
            }
          },
          {
            "key": "jaeger.linkerd.io.tracing-enabled",
            "value": {
              "stringValue": "true"
            }
          },
          {
            "key": "k8s.node.name",
            "value": {
              "stringValue": "i-0af9c8279dabb5258.us-gov-west-1.compute.internal"
            }
          },
          {
            "key": "linkerd.io.control-plane-ns",
            "value": {
              "stringValue": "linkerd"
            }
          },
          {
            "key": "app.kubernetes.io.name",
            "value": {
              "stringValue": "alloy"
            }
          },
          {
            "key": "linkerd.io.created-by",
            "value": {
              "stringValue": "linkerd/proxy-injector edge-24.5.1"
            }
          },
          {
            "key": "kubectl.kubernetes.io.default-container",
            "value": {
              "stringValue": "alloy"
            }
          },
          {
            "key": "linkerd.io.proxy-version",
            "value": {
              "stringValue": "edge-24.5.1"
            }
          },
          {
            "key": "k8s.pod.start_time",
            "value": {
              "stringValue": "2024-07-18T19:19:12Z"
            }
          },
          {
            "key": "k8s.daemonset.name",
            "value": {
              "stringValue": "alloy"
            }
          },
          {
            "key": "linkerd.io.proxy-daemonset",
            "value": {
              "stringValue": "alloy"
            }
          },
          {
            "key": "service.name",
            "value": {
              "stringValue": "nginx"
            }
          },
          {
            "key": "k8s.namespace.name",
            "value": {
              "stringValue": "grafana-alloy"
            }
          },
          {
            "key": "k8s.pod.name",
            "value": {
              "stringValue": "alloy-sp77n"
            }
          }
        ],
        "droppedAttributesCount": 0
      },
      "instrumentationLibrarySpans": [
        {
          "spans": [
            {
              "traceId": "0f413376449cc556e74d8fb427776954",
              "spanId": "0a13242128b666de",
              "parentSpanId": "0000000000000000",
              "traceState": "",
              "name": "",
              "kind": "SPAN_KIND_SERVER",
              "startTimeUnixNano": 1721408032421640000,
              "endTimeUnixNano": 1721408032450101000,
              "attributes": [
                {
                  "key": "http.flavor",
                  "value": {
                    "stringValue": "1.1"
                  }
                },
                {
                  "key": "http.target",
                  "value": {
                    "stringValue": "/v2/socs/e3ec6005-665d-4c29-9ac7-9effff9b423f/gateways/71352608-b0b5-4e13-a5ba-6efab206e256"
                  }
                },
                {
                  "key": "http.server_name",
                  "value": {
                    "stringValue": "console-api-qa1-dev-01.madeup.com"
                  }
                },
                {
                  "key": "http.host",
                  "value": {
                    "stringValue": "console-api-qa1-dev-01.madeup.com"
                  }
                },
                {
                  "key": "http.user_agent",
                  "value": {
                    "stringValue": "OpenAPI-Generator/0.13.0.post0/python"
                  }
                },
                {
                  "key": "http.scheme",
                  "value": {
                    "stringValue": "https"
                  }
                },
                {
                  "key": "net.host.port",
                  "value": {
                    "intValue": 443
                  }
                },
                {
                  "key": "net.peer.ip",
                  "value": {
                    "stringValue": "10.2.26.198"
                  }
                },
                {
                  "key": "net.peer.port",
                  "value": {
                    "intValue": 37944
                  }
                },
                {
                  "key": "ingress.namespace",
                  "value": {
                    "stringValue": "qa1-dev"
                  }
                },
                {
                  "key": "ingress.service_name",
                  "value": {
                    "stringValue": "console-api-qa1-dev"
                  }
                },
                {
                  "key": "ingress.name",
                  "value": {
                    "stringValue": "console-api-qa1-dev"
                  }
                },
                {
                  "key": "ingress.upstream",
                  "value": {
                    "stringValue": "qa1-dev-console-api-qa1-dev-8000"
                  }
                },
                {
                  "key": "http.method",
                  "value": {
                    "stringValue": "PATCH"
                  }
                },
                {
                  "key": "http.status_code",
                  "value": {
                    "intValue": 200
                  }
                }
              ],
              "droppedAttributesCount": 0,
              "droppedEventsCount": 0,
              "droppedLinksCount": 0,
              "status": {
                "code": 0,
                "message": ""
              }
            }
          ],
          "instrumentationLibrary": {
            "name": "nginx",
            "version": ""
          }
        }
      ]
    }
  ]
}
github-actions[bot] commented 3 months ago

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it. If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue. The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity. Thank you for your contributions!

tshuma1 commented 2 months ago

@jseiser How are you populating your substitution values i.e - faro-${cluster_number}-${environment}.${base_domain} . Is this possible in Alloy .config files?

jseiser commented 2 months ago

@tshuma1

Terraform is doing it, so the file are interpolated by the time the helm command is ran.

jseiser commented 2 months ago

Is there any other information I can provide here? We have tried as a deployment, as a daemonset. With and without Alloy being in the service mesh. We have even hardcoded the OTEL Attributes and removes the k8s attributes, but you still end up with the traces from linkerd not being matched.

We have not been able to find a working example of AWS EKS + Grafana Alloy. The issue also appears to extend to the actual OTLP Collector itself,

https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/29630#issuecomment-2359015660

jseiser commented 6 days ago

This is still an issue on the latest stable release.