open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.06k stars 2.36k forks source link

k8sattributes it`s not working in EKS 1.26 #22036

Closed AndriySidliarskiy closed 1 year ago

AndriySidliarskiy commented 1 year ago

Component(s)

processor/k8sattributes

What happened?

Description

I have an open telemetry colector configuration with k8sattributes but in the log context, I cannot see anything from metadata included in k8sattributes

Steps to Reproduce

Configure k8sattributes In EKS 1.26 with open telemetry collector helm chart.

Expected Result

   - k8s.pod.name
   - k8s.pod.uid
   - k8s.deployment.name
   - k8s.namespace.name

Actual Result

nothing

Collector version

0.77.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

k8sattributes:
      auth_type: "serviceAccount"
      passthrough: false
      filter:
        node_from_env_var: KUBE_NODE_NAME     
      extract:
        metadata:
          - k8s.pod.name
          - k8s.pod.uid
          - k8s.deployment.name
          - k8s.namespace.name
        labels:
          - tag_name: c2i.pipeline.execution
            key: c2i.pipeline.execution
            from: pod
          - tag_name: c2i.pipeline.project
            key: c2i.pipeline.project
            from: pod
        annotations:
          - tag_name: monitoring # extracts value of annotation from pods with key `annotation-one` and inserts it as a tag with key `a1`
            key: monitoring
            from: pod
    pipelines:
      logs/eks:
        exporters:
          - coralogix
        processors:
          # - batch
          # - resourcedetection/env
          - k8sattributes
        receivers:
          - k8s_events
          - filelog
          - filelog/2

Log output

No response

Additional context

No response

github-actions[bot] commented 1 year ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

atoulme commented 1 year ago

Anything in the collector logs?

AndriySidliarskiy commented 1 year ago

nothing, k8sattributes launched but in logs nothing

swiatekm commented 1 year ago

Did this configuration work with a previous collector or EKS version?

AndriySidliarskiy commented 1 year ago

idk i use only this version, but in my opinion, it may not work in EKS in general. I used your example and it`s also not working because you extract pod names and names from the path of the file so k8attributes does nothing.

AndriySidliarskiy commented 1 year ago
    receivers:
      filelog:
        include:
          - /var/log/pods/*/*/*.log
        exclude:
          # Exclude logs from all containers named otel-collector
          - /var/log/pods/*/otel-collector/*.log
        start_at: beginning
        include_file_path: true
        include_file_name: false
        operators:
          # Find out which format is used by kubernetes
          - type: router
            id: get-format
            routes:
              - output: parser-docker
                expr: 'body matches "^\\{"'
              - output: parser-crio
                expr: 'body matches "^[^ Z]+ "'
              - output: parser-containerd
                expr: 'body matches "^[^ Z]+Z"'
          # Parse CRI-O format
          - type: regex_parser
            id: parser-crio
            regex: '^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'
            output: extract_metadata_from_filepath
            timestamp:
              parse_from: attributes.time
              layout_type: gotime
              layout: '2006-01-02T15:04:05.999999999Z07:00'
          # Parse CRI-Containerd format
          - type: regex_parser
            id: parser-containerd
            regex: '^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'
            output: extract_metadata_from_filepath
            timestamp:
              parse_from: attributes.time
              layout: '%Y-%m-%dT%H:%M:%S.%LZ'
          # Parse Docker format
          - type: json_parser
            id: parser-docker
            output: extract_metadata_from_filepath
            timestamp:
              parse_from: attributes.time
              layout: '%Y-%m-%dT%H:%M:%S.%LZ'
          - type: move
            from: attributes.log
            to: body
          # Extract metadata from file path
          - type: regex_parser
            id: extract_metadata_from_filepath
            regex: '^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]{36})\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$'
            parse_from: attributes["log.file.path"]
            cache:
              size: 128  # default maximum amount of Pods per Node is 110
          # Rename attributes
          - type: move
            from: attributes.stream
            to: attributes["log.iostream"]
          - type: move
            from: attributes.container_name
            to: resource["k8s.container.name"]
          - type: move
            from: attributes.namespace
            to: resource["k8s.namespace.name"]
          - type: move
            from: attributes.pod_name
            to: resource["k8s.pod.name"]
          - type: move
            from: attributes.restart_count
            to: resource["k8s.container.restart_count"]
          - type: move
            from: attributes.uid
            to: resource["k8s.pod.uid"]

    processors:
      # k8sattributes processor to get the metadata from K8s
      k8sattributes:
        auth_type: "serviceAccount"
        passthrough: false
        extract:
          metadata:
            - k8s.pod.name
            - k8s.pod.uid
            - k8s.deployment.name
            - k8s.cluster.name
            - k8s.namespace.name
            - k8s.node.name
            - k8s.pod.start_time
          # Pod labels which can be fetched via K8sattributeprocessor
          labels:
            - tag_name: key1
              key: label1
              from: pod
            - tag_name: key2
              key: label2
              from: pod
        # Pod association using resource attributes and connection
        pod_association:
          - from: resource_attribute
            name: k8s.pod.uid
          - from: resource_attribute
            name: k8s.pod.ip
          - from: connection

    exporters:
      logging:
        loglevel: debug
    service:
      pipelines:
        logs:
          receivers: [filelog]
          processors: [k8sattributes]
          exporters: [logging]
AndriySidliarskiy commented 1 year ago

you can remove "move" from filelog proccesor and leave only k8sattributes and you must see

swiatekm commented 1 year ago

By default k8sattributes processor identifies the Pod by looking at the ip of the remote which sent the data. This works if the data is sent directly from instrumentation, but if you want to use it in a different context (for example a DaemonSet collecting logs), you need to tell the processor how to identify the Pod for a given resource.

For the configuration you posted, you get k8s.pod.name from the filepath. You also need to tell the processor to use that:

pod_association:
  - sources:
      - from: resource_attribute
        name: k8s.pod.name
      - from: resource_attribute
        name: k8s.namespace.name

I think your current config is missing the - sources part. If it wasn't, k8s.pod.uid should work as well.

AndriySidliarskiy commented 1 year ago

Okay but in the case of pod label how i must configure this in the example it only adds the label part and all good but in my case it`s not working.

swiatekm commented 1 year ago

I'm not sure I follow what exactly you're seeing at this point. Can you post

?

AndriySidliarskiy commented 1 year ago
global:
  domain: "coralogix.com"

mode: daemonset
hostNetwork: false
fullnameOverride: otel-coralogix
clusterRole:
  create: true
  name: "otel-coralogix"
  rules:
    - apiGroups:
      - "*"
      resources:
      - events
      - namespaces
      - namespaces/status
      - nodes
      - nodes/spec
      - pods
      - pods/status
      - replicationcontrollers
      - replicationcontrollers/status
      - resourcequotas
      - services
      - endpoints
      - nodes/proxy
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - apps
      resources:
      - daemonsets
      - deployments
      - replicasets
      - statefulsets
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - extensions
      resources:
      - daemonsets
      - deployments
      - replicasets
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - batch
      resources:
      - jobs
      - cronjobs
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - "*"
      resources:
      - horizontalpodautoscalers
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - "*"
      resources:
      - nodes/stats
      - configmaps
      - events
      - leases
      verbs:
      - get
      - list
      - watch
      - create
      - update
  clusterRoleBinding:
    name: "otel-coralogix"
presets:
  logsCollection:
    enabled: false
    storeCheckpoints: true
  # kubernetesAttributes:
  #   enabled: true
  # hostMetrics:
  #   enabled: true
  # kubeletMetrics:
  #   enabled: true

extraEnvs:
- name: CORALOGIX_PRIVATE_KEY
  value: 
- name: KUBE_NODE_NAME
  valueFrom:
    fieldRef:
      apiVersion: v1
      fieldPath: spec.nodeName
- name: HOST_IP
  valueFrom:
    fieldRef:
      fieldPath: status.hostIP
- name: HOST_NAME
  valueFrom:
    fieldRef:
      fieldPath: spec.nodeName
- name: K8S_NAMESPACE
  valueFrom:
      fieldRef:
        fieldPath: metadata.namespace
- name: CLUSTER_NAME
  value: ""
config:
  extensions:
    zpages:
      endpoint: localhost:55679
    pprof:
      endpoint: localhost:1777
  exporters:
    coralogix:
      timeout: "1m"
      private_key: "${CORALOGIX_PRIVATE_KEY}"
      domain: "{{ .Values.global.domain }}"
      application_name_attributes:
      - "cloud.account.id"
      application_name: "{{.Values.global.defaultApplicationName }}"
      subsystem_name: "{{.Values.global.defaultSubsystemName }}"
  processors:
    # transform:
    #   error_mode: ignore
    #   log_statements:
    #     - context: resource
    #       statements:
    #         - keep_keys(attributes, ["cloud.region", "host.id", "host.type"])
    #     - context: log
    #       statements:
    #         - keep_keys(attributes, ["time", "log.file.name", "k8s.cluster.name", "k8s.namespace.name", "k8s.pod.name", "log.file.path", "k8s.ident", "message", "log", "k8s.pod.restart_count", "k8s.job.name"])
    transform/cw:
      error_mode: ignore
      log_statements:
        - context: resource
          statements:
            - keep_keys(attributes, ["cloud.region", "cloudwatch.log.group.name", "cloudwatch.log.stream"])
    k8sattributes:
      auth_type: "serviceAccount"
      passthrough: false
      filter:
        node_from_env_var: KUBE_NODE_NAME     
      extract:
        metadata:
          - k8s.pod.start_time
          - k8s.deployment.name
        labels:
          - tag_name: pipeline.execution
            key: pipeline.execution
            from: pod
          - tag_name: pipeline.project
            key: pipeline.project
            from: pod
        annotations:
          - tag_name: monitoring # extracts value of annotation from pods with key `annotation-one` and inserts it as a tag with key `a1`
            key: monitoring
            from: pod
      pod_association:
        - sources:
            - from: resource_attribute
              name: k8s.pod.start_time
            - from: resource_attribute
              name: k8s.deployment.name
    memory_limiter: null # Will get the k8s resource limits
    resourcedetection/env:
      detectors: ["env", "ec2"]
      timeout: 2s
      override: false
    spanmetrics:
      metrics_exporter: coralogix
      dimensions:
        - name: "k8s.pod.name"
        - name: "k8s.cronjob.name"
        - name: "k8s.job.name"
        - name: "k8s.node.name"
        - name: "k8s.namespace.name" 
  receivers:
    awscloudwatch:
      region: eu-west-1
      logs:
        poll_interval: "30s"
        groups:
          autodiscover:
            limit: 100
            prefix: /aws/vendedlogs/states/
            streams:
              prefixes: [states/]
    filelog:
      include: [/var/log/pods/*/*/*.log]
      include_file_name: false
      include_file_path: true
      operators:
        - type: router
          id: get-format
          routes:
            - output: parser-docker
              expr: 'body matches "^\\{"'
            - output: parser-containerd
              expr: 'body matches "^[^ Z]+Z"'
        - type: regex_parser
          id: parser-containerd
          regex: '^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<message>.*)$'
          output: extract_metadata_from_filepath
          timestamp:
            parse_from: attributes.time
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        - type: json_parser
          id: parser-docker
          output: extract_metadata_from_filepath
          timestamp:
            parse_from: attributes.time
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        - type: regex_parser
          id: extract_metadata_from_filepath
          regex: '^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]{36})\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$'
          parse_from: attributes["log.file.path"]
        - type: move
          from: attributes.namespace
          to: attributes["k8s.namespace.name"]
        - type: move
          from: attributes.restart_count
          to: attributes["k8s.pod.restart_count"]
        - type: move
          from: attributes.message
          to: body
        - type: move
          from: attributes.pod_name
          to: attributes["k8s.pod.name"]
        - type: move
          from: attributes.container_name
          to: attributes["k8s.container.name"]
        - type: add
          field: attributes["k8s.cluster.name"]
          value: '${CLUSTER_NAME}'
    filelog/2:
      include: [ /var/log/messages, /var/log/dmesg, /var/log/secure]
      include_file_name: false
      include_file_path: true
      operators:
        - type: router
          id: get-format
          routes:
            - output: parser-containerd
              expr: 'body matches " .* containerd: .*"'
            - output: parser-kubelet
              expr: 'body matches " .* kubelet: .*"'
            - output: parser-syslog
              expr: 'body matches " .* systemd: .*"'
            - output: parser-dhclient
              expr: 'body matches ".* dhclient[4608]: .*"'
        - type: regex_parser
          id: parser-dhclient
          regex: '^(?P<time>[^ ]* {1,2}[^ ]* [^ ]*) (?P<host>[^ ]*) (?P<ident>[^:]*): (?P<message>.*)$'
          output: move
        - type: syslog_parser
          id: parser-syslog
          protocol: rfc3164
        - type: regex_parser
          id: parser-containerd
          regex: '^(?P<time>^[^ ]* {1,2}[^ ]* [^ ]*) (?P<host>[^ ]*) (?P<indent>[a-zA-z0-9_\/\.\-]*): time=\".+\" (?P<level>level=[a-zA-Z]+) (?P<msg>msg=".*")'
          output: move
        - type: regex_parser
          id: parser-kubelet
          regex: '^(?P<time>[^ ]* {1,2}[^ ]* [^ ]*) (?P<host>[^ ]*) (?P<ident>[a-zA-Z0-9_\/.\-]*): (?P<level>[A-Z][a-z]*[0-9]*) (?P<pid>[0-9]+) (?P<source>[^:]*): *(?P<message>.*)$'
          output: move
        - type: file_input
          id: parser-dmesg
          include:
            - /var/log/dmesg
            - /var/log/secure
        - type: move
          from: attributes.message
          to: body
        - type: move
          from: attributes.ident
          to: attributes["k8s.ident"]
        - type: add
          field: attributes["k8s.cluster.name"]
          value: '${CLUSTER_NAME}'
    k8s_events:
      auth_type: "serviceAccount"
      namespaces: [default, kube-system, general, monitoring]
    k8s_cluster:
      auth_type: "serviceAccount"
      allocatable_types_to_report: [cpu, memory, storage, ephemeral-storage]
      node_conditions_to_report: [Ready, MemoryPressure]
    otlp:
      protocols:
        grpc:
          endpoint: ${MY_POD_IP}:4317
        http:
          endpoint: ${MY_POD_IP}:4318
  service:
    extensions:
    - zpages
    - pprof
    - health_check
    - memory_ballast
    telemetry:
      metrics:
        address: ${MY_POD_IP}:8888
    pipelines:
      logs:
        exporters:
          - coralogix
        processors:
          - batch
          - resourcedetection/env
          - transform/cw
        receivers:
          - awscloudwatch
      logs/eks:
        exporters:
          - coralogix
        processors:
          # - batch
          # - resourcedetection/env
          - k8sattributes
        receivers:
          - k8s_events
          - filelog
          - filelog/2
      metrics:
        exporters:
          - coralogix
        processors:
          - memory_limiter
          - resourcedetection/env
          - batch
        receivers:
          - otlp
          - k8s_cluster
      traces:
        exporters:
          - coralogix
        processors:
          - memory_limiter
          - spanmetrics
          - batch
          - resourcedetection/env
        receivers:
          - otlp
          - zipkin
tolerations: 
  - operator: Exists

extraVolumes:
  - name: varlog
    hostPath:
      path: /var/log
      type: ''
  - name: rootfs
    hostPath:
      path: /
  - name: varlibdocker
    hostPath:
      path: /var/lib/docker
  - name: containerdsock
    hostPath:
      path: /run/containerd/containerd.sock
  - name: sys
    hostPath:
      path: /sys
  - name: devdisk
    hostPath:
      path: /dev/disk/
extraVolumeMounts:
  - name: varlog
    readOnly: true
    mountPath: /var/log
  - name: rootfs
    mountPath: /rootfs
    readOnly: true
  - name: containerdsock
    mountPath: /run/containerd/containerd.sock
    readOnly: true
  - name: sys
    mountPath: /sys
    readOnly: true
  - name: devdisk
    mountPath: /dev/disk
    readOnly: true
resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 1
    memory: 2G

ports:
  metrics:
    enabled: true
# podMonitor:
#   enabled: true
# prometheusRule:
#   enabled: true
#   defaultRules:
#     enabled: true

it`s current configuration

resource.attributes.cx.application.name 
resource.attributes.cx.subsystem.name: 
resource.droppedAttributesCount:0
scope.name:
scope.version:
logRecord.body:time="2023-05-18T12:45:21.789Z" level=info duration="106.653µs" method=GET path=index.html size=473 status=0
logRecord.severityNumber:0
logRecord.attributes.k8s.cluster.name:
logRecord.attributes.k8s.container.name:argo-server
logRecord.attributes.k8s.namespace.name:argowf
logRecord.attributes.k8s.pod.name:argo-workflow-argo-workflows-server-544c8467d8-hrmc2
logRecord.attributes.k8s.pod.restart_count:0
logRecord.attributes.log.file.path:/var/log/pods/argowf_argo-workflow-argo-workflows-server-544c8467d8-hrmc2_5226c03f-ac02-4b0b-8246-03693780f345/argo-server/0.log
logRecord.attributes.logtag:F
logRecord.attributes.stream:stderr
logRecord.attributes.time:2023-05-18T12:45:21.789936237Z
logRecord.attributes.uid:5226c03f-ac02-4b0b-8246-03693780f345
18/05/2023 15:45:19.561 pm

If you see i added k8s.deployment.name in k8sattributes but nothing in log context. @swiatekm-sumo

swiatekm commented 1 year ago

This:

pod_association:
  - sources:
      - from: resource_attribute
        name: k8s.pod.start_time
      - from: resource_attribute
        name: k8s.deployment.name

should instead be:

pod_association:
  - sources:
      - from: resource_attribute
        name: k8s.pod.name
      - from: resource_attribute
        name: k8s.namespace.name

To be clear, these sources shouldn't be the same as the attributes you have specified under extract.metadata.

As an aside, you should be careful about using "global" receivers like k8sevents in a DaemonSet context. You're going to get the same events out of every collector Pod, whereas you only want them once per cluster. Same thing is true about the cluster receiver, and probably about aws cloudwatch.

AndriySidliarskiy commented 1 year ago

what difference between the 2 pod_association for me, i need the deployment name also labels but it`s now extracting metadata and labels from the pod.

AndriySidliarskiy commented 1 year ago

im sending k8s.pod.start_time for example its not the right decision, the main problem k8sattributes not working

swiatekm commented 1 year ago

pod_association is for telling the processor how to identify your Pod. The attributes in that section need to already be present on the resource. extract.metadata is where you specify what new attributes you want added. Does that make sense?

AndriySidliarskiy commented 1 year ago

yes but maybe you know why k8sattributes cannot extract the deployment name or another attribute. Maybe he can conflict with another processor.

AndriySidliarskiy commented 1 year ago

Are there any updates on why the processor cannot extract the pod label? @swiatekm-sumo

swiatekm commented 1 year ago

I'm honestly a bit lost as to the current state of your setup @AndriySidliarskiy. Can you be clearer about:

AndriySidliarskiy commented 1 year ago
  1. https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/22036#issuecomment-1553000661

  2. resource.attributes.cx.application.name resource.attributes.cx.subsystem.name: resource.droppedAttributesCount:0 scope.name: scope.version: logRecord.body:time="2023-05-18T12:45:21.789Z" level=info duration="106.653µs" method=GET path=index.html size=473 status=0 logRecord.severityNumber:0 logRecord.attributes.k8s.cluster.name: logRecord.attributes.k8s.container.name:argo-server logRecord.attributes.k8s.namespace.name:argowf logRecord.attributes.k8s.pod.name:argo-workflow-argo-workflows-server-544c8467d8-hrmc2 logRecord.attributes.k8s.pod.restart_count:0 logRecord.attributes.log.file.path:/var/log/pods/argowf_argo-workflow-argo-workflows-server-544c8467d8-hrmc2_5226c03f-ac02-4b0b-8246-03693780f345/argo-server/0.log logRecord.attributes.logtag:F logRecord.attributes.stream:stderr logRecord.attributes.time:2023-05-18T12:45:21.789936237Z logRecord.attributes.uid:5226c03f-ac02-4b0b-8246-03693780f345 18/05/2023 15:45:19.561 pm

  3. pipeline.execution - string inside log context in coralogix log

@swiatekm-sumo

swiatekm commented 1 year ago

I see the problem now, you have the identifying information in record attributes instead of resource attributes. They need to be at the resource level. In your filelog receiver configuration, change:

        - type: move
          from: attributes.namespace
          to: attributes["k8s.namespace.name"]
        - type: move
          from: attributes.restart_count
          to: attributes["k8s.pod.restart_count"]

        - type: move
          from: attributes.pod_name
          to: attributes["k8s.pod.name"]
        - type: move
          from: attributes.container_name
          to: attributes["k8s.container.name"]

to:

        - type: move
          from: attributes.namespace
          to: resource["k8s.namespace.name"]
        - type: move
          from: attributes.restart_count
          to: resource["k8s.pod.restart_count"]

        - type: move
          from: attributes.pod_name
          to: resource["k8s.pod.name"]
        - type: move
          from: attributes.container_name
          to: resource["k8s.container.name"]
AndriySidliarskiy commented 1 year ago

@swiatekm-sumo but the main problem it`s the extract pod label. and for me this solution not working. I cannot extract metadata and put it in log.

k8sattributes:
      auth_type: "serviceAccount"
      passthrough: false
      filter:
        node_from_env_var: KUBE_NODE_NAME     
      extract:
        metadata:
          - k8s.pod.start_time
          - k8s.deployment.name
        labels:
          - tag_name: pipeline.execution
            key: pipeline.execution
            from: pod
          - tag_name: pipeline.project
            key: pipeline.project
            from: pod
        annotations:
          - tag_name: monitoring # extracts value of annotation from pods with key `annotation-one` and inserts it as a tag with key `a1`
            key: monitoring
            from: pod
swiatekm commented 1 year ago

So you do see k8s.pod.start_time and k8s.deployment.name in your resource attributes, but not the tags from Pod labels?

AndriySidliarskiy commented 1 year ago

and after this https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/22036#issuecomment-1560770154 i have k8s.pod.name inside the resource but the log doesn't have a deployment name and labels that must be extracted by k8sattributtes. @swiatekm-sumo

resource.attributes.cx.application.name:
resource.attributes.cx.subsystem.name:
resource.attributes.k8s.container.name:argo-server
resource.attributes.k8s.namespace.name:argowf
resource.attributes.k8s.pod.name:argo-workflow-argo-workflows-server-7cdb9788bb-dmrxc
resource.attributes.k8s.pod.restart_count:0
resource.droppedAttributesCount:0
scope.name:
scope.version:
logRecord.body:time="2023-05-24T09:51:00.040Z" level=info duration="109.532µs" method=GET path=index.html size=473 status=0
logRecord.severityNumber:0
logRecord.attributes.k8s.cluster.name:cs-dev-eks-cluster
logRecord.attributes.log.file.path:/var/log/pods/argowf_argo-workflow-argo-workflows-server-7cdb9788bb-dmrxc_9a03c9b3-9af8-4aa1-bed5-5ee773fc928a/argo-server/0.log
logRecord.attributes.logtag:F
logRecord.attributes.stream:stderr
logRecord.attributes.time:2023-05-24T09:51:00.041367531Z
logRecord.attributes.uid:9a03c9b3-9af8-4aa1-bed5-5ee773fc928a
swiatekm commented 1 year ago

Have you also implemented the changes from https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/22036#issuecomment-1553121377?

AndriySidliarskiy commented 1 year ago

yes, in my opinion, k8sattributes not working, in opentemeetry logs he launched but cannot extract metadata.

swiatekm commented 1 year ago

Yes, I can see it's not working, I'm trying to figure out what's wrong with your configuration that's causing it. Can you post your current configuration again? If you're looking at collector logs, can you post those as well?

AndriySidliarskiy commented 1 year ago

@swiatekm-sumo

global:
  domain: "coralogix.com"
  defaultApplicationName: ""
  defaultSubsystemName: ""

mode: daemonset
hostNetwork: false
fullnameOverride: otel-coralogix
clusterRole:
  create: true
  name: "otel-coralogix"
  rules:
    - apiGroups:
      - "*"
      resources:
      - events
      - namespaces
      - namespaces/status
      - nodes
      - nodes/spec
      - pods
      - pods/status
      - replicationcontrollers
      - replicationcontrollers/status
      - resourcequotas
      - services
      - endpoints
      - nodes/proxy
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - apps
      resources:
      - daemonsets
      - deployments
      - replicasets
      - statefulsets
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - extensions
      resources:
      - daemonsets
      - deployments
      - replicasets
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - batch
      resources:
      - jobs
      - cronjobs
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - "*"
      resources:
      - horizontalpodautoscalers
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - "*"
      resources:
      - nodes/stats
      - configmaps
      - events
      - leases
      verbs:
      - get
      - list
      - watch
      - create
      - update
  clusterRoleBinding:
    name: "otel-coralogix"
presets:
  logsCollection:
    enabled: false
    storeCheckpoints: true
  # kubernetesAttributes:
  #   enabled: true
  # hostMetrics:
  #   enabled: true
  # kubeletMetrics:
  #   enabled: true

extraEnvs:
- name: CORALOGIX_PRIVATE_KEY
  value:
- name: KUBE_NODE_NAME
  valueFrom:
    fieldRef:
      apiVersion: v1
      fieldPath: spec.nodeName
- name: HOST_IP
  valueFrom:
    fieldRef:
      fieldPath: status.hostIP
- name: HOST_NAME
  valueFrom:
    fieldRef:
      fieldPath: spec.nodeName
- name: K8S_NAMESPACE
  valueFrom:
      fieldRef:
        fieldPath: metadata.namespace
- name: CLUSTER_NAME
  value: ""
config:
  extensions:
    zpages:
      endpoint: localhost:55679
    pprof:
      endpoint: localhost:1777
  exporters:
    coralogix:
      timeout: "1m"
      private_key: "${CORALOGIX_PRIVATE_KEY}"
      domain: "{{ .Values.global.domain }}"
      application_name_attributes:
      - "cloud.account.id"
      application_name: "{{.Values.global.defaultApplicationName }}"
      subsystem_name: "{{.Values.global.defaultSubsystemName }}"
  processors:
    # transform:
    #   error_mode: ignore
    #   log_statements:
    #     - context: resource
    #       statements:
    #         - keep_keys(attributes, ["cloud.region", "host.id", "host.type"])
    #     - context: log
    #       statements:
    #         - keep_keys(attributes, ["time", "log.file.name", "k8s.cluster.name", "k8s.namespace.name", "k8s.pod.name", "log.file.path", "k8s.ident", "message", "log", "k8s.pod.restart_count", "k8s.job.name"])
    transform/cw:
      error_mode: ignore
      log_statements:
        - context: resource
          statements:
            - keep_keys(attributes, ["cloud.region", "cloudwatch.log.group.name", "cloudwatch.log.stream"])
    k8sattributes:
      auth_type: "serviceAccount"
      passthrough: false
      filter:
        node_from_env_var: KUBE_NODE_NAME     
      extract:
        metadata:
          - k8s.pod.start_time
          - k8s.pod.name
          - k8s.deployment.name
          - k8s.namespace.name
        labels:
          - tag_name: c2i.pipeline.execution
            key: c2i.pipeline.execution
            from: pod
          - tag_name: c2i.pipeline.project
            key: c2i.pipeline.project
            from: pod
        annotations:
          - tag_name: monitoring # extracts value of annotation from pods with key `annotation-one` and inserts it as a tag with key `a1`
            key: monitoring
            from: pod
      pod_association:
        - sources:
            - from: resource_attribute
              name: k8s.pod.name
            - from: resource_attribute
              name: k8s.deployment.name
            - from: resource_attribute
              name: c2i.pipeline.project
    memory_limiter: null # Will get the k8s resource limits
    resourcedetection/env:
      detectors: ["env", "ec2"]
      timeout: 2s
      override: false
    spanmetrics:
      metrics_exporter: coralogix
      dimensions:
        - name: "k8s.pod.name"
        - name: "k8s.cronjob.name"
        - name: "k8s.job.name"
        - name: "k8s.node.name"
        - name: "k8s.namespace.name" 
  receivers:
    awscloudwatch:
      region: eu-west-1
      logs:
        poll_interval: "30s"
        groups:
          autodiscover:
            limit: 100
            prefix: /aws/vendedlogs/states/
            streams:
              prefixes: [states/]
    filelog:
      include: [/var/log/pods/*/*/*.log]
      include_file_name: false
      include_file_path: true
      operators:
        - type: router
          id: get-format
          routes:
            - output: parser-docker
              expr: 'body matches "^\\{"'
            - output: parser-containerd
              expr: 'body matches "^[^ Z]+Z"'
        - type: regex_parser
          id: parser-containerd
          regex: '^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<message>.*)$'
          output: extract_metadata_from_filepath
          timestamp:
            parse_from: attributes.time
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        - type: json_parser
          id: parser-docker
          output: extract_metadata_from_filepath
          timestamp:
            parse_from: attributes.time
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        - type: regex_parser
          id: extract_metadata_from_filepath
          regex: '^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]{36})\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$'
          parse_from: attributes["log.file.path"]
        - type: move
          from: attributes.namespace
          to: resource["k8s.namespace.name"]
        - type: move
          from: attributes.restart_count
          to: resource["k8s.pod.restart_count"]
        - type: move
          from: attributes.message
          to: body
        - type: move
          from: attributes.pod_name
          to: resource["k8s.pod.name"]
        - type: move
          from: attributes.container_name
          to: resource["k8s.container.name"]
        - type: add
          field: attributes["k8s.cluster.name"]
          value: '${CLUSTER_NAME}'
    filelog/2:
      include: [ /var/log/messages, /var/log/dmesg, /var/log/secure]
      include_file_name: false
      include_file_path: true
      operators:
        - type: router
          id: get-format
          routes:
            - output: parser-containerd
              expr: 'body matches " .* containerd: .*"'
            - output: parser-kubelet
              expr: 'body matches " .* kubelet: .*"'
            - output: parser-syslog
              expr: 'body matches " .* systemd: .*"'
            - output: parser-dhclient
              expr: 'body matches ".* dhclient[4608]: .*"'
        - type: regex_parser
          id: parser-dhclient
          regex: '^(?P<time>[^ ]* {1,2}[^ ]* [^ ]*) (?P<host>[^ ]*) (?P<ident>[^:]*): (?P<message>.*)$'
          output: move
        - type: syslog_parser
          id: parser-syslog
          protocol: rfc3164
        - type: regex_parser
          id: parser-containerd
          regex: '^(?P<time>^[^ ]* {1,2}[^ ]* [^ ]*) (?P<host>[^ ]*) (?P<indent>[a-zA-z0-9_\/\.\-]*): time=\".+\" (?P<level>level=[a-zA-Z]+) (?P<msg>msg=".*")'
          output: move
        - type: regex_parser
          id: parser-kubelet
          regex: '^(?P<time>[^ ]* {1,2}[^ ]* [^ ]*) (?P<host>[^ ]*) (?P<ident>[a-zA-Z0-9_\/.\-]*): (?P<level>[A-Z][a-z]*[0-9]*) (?P<pid>[0-9]+) (?P<source>[^:]*): *(?P<message>.*)$'
          output: move
        - type: file_input
          id: parser-dmesg
          include:
            - /var/log/dmesg
            - /var/log/secure
        - type: move
          from: attributes.message
          to: body
        - type: move
          from: attributes.ident
          to: attributes["k8s.ident"]
        - type: add
          field: attributes["k8s.cluster.name"]
          value: '${CLUSTER_NAME}'
    # k8s_events:
    #   auth_type: "serviceAccount"
    #   namespaces: [default, kube-system, general, monitoring]
    k8s_cluster:
      auth_type: "serviceAccount"
      allocatable_types_to_report: [cpu, memory, storage, ephemeral-storage]
      node_conditions_to_report: [Ready, MemoryPressure]
    otlp:
      protocols:
        grpc:
          endpoint: ${MY_POD_IP}:4317
        http:
          endpoint: ${MY_POD_IP}:4318
  service:
    extensions:
    - zpages
    - pprof
    - health_check
    - memory_ballast
    telemetry:
      metrics:
        address: ${MY_POD_IP}:8888
    pipelines:
      logs:
        exporters:
          - coralogix
        processors:
          - batch
          - resourcedetection/env
          - transform/cw
        receivers:
          - awscloudwatch
      logs/eks:
        exporters:
          - coralogix
        processors:
          - batch
          - resourcedetection/env
          - k8sattributes
        receivers:
          # - k8s_events
          - filelog
          - filelog/2
      metrics:
        exporters:
          - coralogix
        processors:
          - memory_limiter
          - resourcedetection/env
          - batch
        receivers:
          - otlp
          - k8s_cluster
      traces:
        exporters:
          - coralogix
        processors:
          - memory_limiter
          - spanmetrics
          - batch
          - resourcedetection/env
        receivers:
          - otlp
          - zipkin
tolerations: 
  - operator: Exists

extraVolumes:
  - name: varlog
    hostPath:
      path: /var/log
      type: ''
  - name: rootfs
    hostPath:
      path: /
  - name: varlibdocker
    hostPath:
      path: /var/lib/docker
  - name: containerdsock
    hostPath:
      path: /run/containerd/containerd.sock
  - name: sys
    hostPath:
      path: /sys
  - name: devdisk
    hostPath:
      path: /dev/disk/
extraVolumeMounts:
  - name: varlog
    readOnly: true
    mountPath: /var/log
  - name: rootfs
    mountPath: /rootfs
    readOnly: true
  - name: containerdsock
    mountPath: /run/containerd/containerd.sock
    readOnly: true
  - name: sys
    mountPath: /sys
    readOnly: true
  - name: devdisk
    mountPath: /dev/disk
    readOnly: true
resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 1
    memory: 2G

ports:
  metrics:
    enabled: true
# podMonitor:
#   enabled: true
# prometheusRule:
#   enabled: true
#   defaultRules:
#     enabled: true
2023-05-24T10:22:51.268Z    info    service/telemetry.go:113    Setting up own telemetry...
2023-05-24T10:22:51.269Z    info    service/telemetry.go:136    Serving Prometheus metrics  {"address": "10.125.37.140:8888", "level": "Basic"}
2023-05-24T10:22:51.269Z    info    processor/processor.go:300  Deprecated component. Will be removed in future releases.   {"kind": "processor", "name": "spanmetrics", "pipeline": "traces"}
2023-05-24T10:22:51.269Z    info    spanmetricsprocessor@v0.77.0/processor.go:139   Building spanmetrics    {"kind": "processor", "name": "spanmetrics", "pipeline": "traces"}
2023-05-24T10:22:51.272Z    info    memorylimiterprocessor@v0.77.0/memorylimiter.go:149 Using percentage memory limiter {"kind": "processor", "name": "memory_limiter", "pipeline": "traces", "total_memory_mib": 1907, "limit_percentage": 80, "spike_limit_percentage": 25}
2023-05-24T10:22:51.272Z    info    memorylimiterprocessor@v0.77.0/memorylimiter.go:113 Memory limiter configured   {"kind": "processor", "name": "memory_limiter", "pipeline": "traces", "limit_mib": 1525, "spike_limit_mib": 476, "check_interval": 5}
2023-05-24T10:22:51.272Z    info    kube/client.go:101  k8s filtering   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "labelSelector": "", "fieldSelector": "spec.nodeName=ip-10-125-46-141.eu-west-1.compute.internal"}
2023-05-24T10:22:51.277Z    info    service/service.go:141  Starting otelcol-contrib... {"Version": "0.77.0", "NumCPU": 96}
2023-05-24T10:22:51.277Z    info    extensions/extensions.go:41 Starting extensions...
2023-05-24T10:22:51.277Z    info    extensions/extensions.go:44 Extension is starting...    {"kind": "extension", "name": "zpages"}
2023-05-24T10:22:51.277Z    info    zpagesextension@v0.77.0/zpagesextension.go:64   Registered zPages span processor on tracer provider {"kind": "extension", "name": "zpages"}
2023-05-24T10:22:51.277Z    info    zpagesextension@v0.77.0/zpagesextension.go:74   Registered Host's zPages    {"kind": "extension", "name": "zpages"}
2023-05-24T10:22:51.277Z    info    zpagesextension@v0.77.0/zpagesextension.go:86   Starting zPages extension   {"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"localhost:55679"}}}
2023-05-24T10:22:51.277Z    info    extensions/extensions.go:48 Extension started.  {"kind": "extension", "name": "zpages"}
2023-05-24T10:22:51.277Z    info    extensions/extensions.go:44 Extension is starting...    {"kind": "extension", "name": "pprof"}
2023-05-24T10:22:51.278Z    info    pprofextension@v0.77.0/pprofextension.go:71 Starting net/http/pprof server  {"kind": "extension", "name": "pprof", "config": {"TCPAddr":{"Endpoint":"localhost:1777"},"BlockProfileFraction":0,"MutexProfileFraction":0,"SaveToFile":""}}
2023-05-24T10:22:51.278Z    info    extensions/extensions.go:48 Extension started.  {"kind": "extension", "name": "pprof"}
2023-05-24T10:22:51.278Z    info    extensions/extensions.go:44 Extension is starting...    {"kind": "extension", "name": "health_check"}
2023-05-24T10:22:51.278Z    info    healthcheckextension@v0.77.0/healthcheckextension.go:45 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2023-05-24T10:22:51.278Z    warn    internal/warning.go:51  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    {"kind": "extension", "name": "health_check", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-05-24T10:22:51.278Z    info    extensions/extensions.go:48 Extension started.  {"kind": "extension", "name": "health_check"}
2023-05-24T10:22:51.278Z    info    extensions/extensions.go:44 Extension is starting...    {"kind": "extension", "name": "memory_ballast"}
2023-05-24T10:22:51.295Z    info    ballastextension@v0.77.0/memory_ballast.go:52   Setting memory ballast  {"kind": "extension", "name": "memory_ballast", "MiBs": 762}
2023-05-24T10:22:51.297Z    info    extensions/extensions.go:48 Extension started.  {"kind": "extension", "name": "memory_ballast"}
2023-05-24T10:22:51.298Z    info    internal/resourcedetection.go:136   began detecting resource information    {"kind": "processor", "name": "resourcedetection/env", "pipeline": "traces"}
2023-05-24T10:22:53.298Z    info    internal/resourcedetection.go:150   detected resource information   {"kind": "processor", "name": "resourcedetection/env", "pipeline": "traces", "resource": {}}
2023-05-24T10:22:53.299Z    info    adapter/receiver.go:56  Starting stanza receiver    {"kind": "receiver", "name": "filelog/2", "data_type": "logs"}
2023-05-24T10:22:53.299Z    info    spanmetricsprocessor@v0.77.0/processor.go:182   Starting spanmetricsprocessor   {"kind": "processor", "name": "spanmetrics", "pipeline": "traces"}
2023-05-24T10:22:53.299Z    info    spanmetricsprocessor@v0.77.0/processor.go:202   Found exporter  {"kind": "processor", "name": "spanmetrics", "pipeline": "traces", "spanmetrics-exporter": "coralogix"}
2023-05-24T10:22:53.320Z    info    otlpreceiver@v0.77.0/otlp.go:94 Starting GRPC server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "10.125.37.140:4317"}
2023-05-24T10:22:53.320Z    info    otlpreceiver@v0.77.0/otlp.go:112    Starting HTTP server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "10.125.37.140:4318"}
2023-05-24T10:22:53.320Z    info    adapter/receiver.go:56  Starting stanza receiver    {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-05-24T10:22:53.320Z    info    k8sclusterreceiver@v0.77.0/receiver.go:60   Starting shared informers and wait for initial cache sync.  {"kind": "receiver", "name": "k8s_cluster", "data_type": "metrics"}
swiatekm commented 1 year ago

Ok, can you try removing the last entry in pod_association.sources?:

      pod_association:
        - sources:
            - from: resource_attribute
              name: c2i.pipeline.project

If that doesn't help, we might need to enable debug logging and see what exactly the processor is doing in regards to Pod identifiers.

AndriySidliarskiy commented 1 year ago

@swiatekm-sumo i add this for test purposes and it`s also not working.

swiatekm commented 1 year ago

Also, this:

      pod_association:
        - sources:
            - from: resource_attribute
              name: k8s.deployment.name

should have k8s.namespace.name instead of k8s.deployment.name.

swiatekm commented 1 year ago

If that doesn't help, please enable debug logging by setting:

service:
  telemetry:
    logs:
      level: DEBUG

and post the collector logs you see. There's probably going to be a lot, so it would help a lot if you only posted logs from the k8sattributes processor.

AndriySidliarskiy commented 1 year ago

@swiatekm-sumo hi. New logs from DEBUG

2023-05-25T07:43:16.274Z    debug   k8sattributesprocessor@v0.78.0/processor.go:102 evaluating pod identifier   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "value": [{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""}]}
2023-05-25T07:43:16.332Z    error   fileconsumer/reader.go:62   Failed to seek  {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/monitoring_job-remover-7c46f7d89d-ssqb8_7c5fb391-88f2-471f-9a79-333534cdb41f/job-remover/747.log", "error": "seek /var/log/pods/monitoring_job-remover-7c46f7d89d-ssqb8_7c5fb391-88f2-471f-9a79-333534cdb41f/job-remover/747.log: file already closed"}
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*Reader).ReadToEnd
    github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.78.0/fileconsumer/reader.go:62
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*Manager).consume.func1
    github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.78.0/fileconsumer/file.go:148
2023-05-25T07:43:16.474Z    debug   k8sattributesprocessor@v0.78.0/processor.go:102 evaluating pod identifier   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "value": [{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""}]}
2023-05-25T07:43:16.499Z    debug   fileconsumer/reader.go:161  Problem closing reader  {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/monitoring_job-remover-7c46f7d89d-ssqb8_7c5fb391-88f2-471f-9a79-333534cdb41f/job-remover/747.log", "error": "close /var/log/pods/monitoring_job-remover-7c46f7d89d-ssqb8_7c5fb391-88f2-471f-9a79-333534cdb41f/job-remover/747.log: file already closed"}
2023-05-25T07:43:16.676Z    debug   k8sattributesprocessor@v0.78.0/processor.go:102 evaluating pod identifier   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "value": [{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""}]}
2023-05-25T07:43:16.876Z    debug   k8sattributesprocessor@v0.78.0/processor.go:102 evaluating pod identifier   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "value": [{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""}]}
2023-05-25T07:43:17.077Z    debug   k8sattributesprocessor@v0.78.0/processor.go:102 evaluating pod identifier   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "value": [{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""}]}
2023-05-25T07:43:17.098Z    debug   fileconsumer/file.go:129    Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-05-25T07:43:17.277Z    debug   k8sattributesprocessor@v0.78.0/processor.go:102 evaluating pod identifier   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "value": [{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""}]}
2023-05-25T07:43:17.296Z    debug   fileconsumer/file.go:129    Consuming files {"kind": "receiver", "name": "filelog/2", "data_type": "logs", "component": "fileconsumer"}
2023-05-25T07:43:17.478Z    debug   k8sattributesprocessor@v0.78.0/processor.go:102 evaluating pod identifier   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "value": [{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""}]}
2023-05-25T07:43:17.478Z    debug   k8sattributesprocessor@v0.78.0/processor.go:102 evaluating pod identifier   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "value": [{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""}]}
2023-05-25T07:43:17.498Z    error   fileconsumer/reader.go:62   Failed to seek  {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/monitoring_job-remover-7c46f7d89d-ssqb8_7c5fb391-88f2-471f-9a79-333534cdb41f/job-remover/748.log", "error": "seek /var/log/pods/monitoring_job-remover-7c46f7d89d-ssqb8_7c5fb391-88f2-471f-9a79-333534cdb41f/job-remover/748.log: file already closed"}
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*Reader).ReadToEnd
    github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.78.0/fileconsumer/reader.go:62
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*detectLostFiles).readLostFiles.func1
    github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.78.0/fileconsumer/roller_other.go:40
2023-05-25T07:43:17.499Z    debug   fileconsumer/reader.go:161  Problem closing reader  {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/monitoring_job-remover-7c46f7d89d-ssqb8_7c5fb391-88f2-471f-9a79-333534cdb41f/job-remover/747.log", "error": "close /var/log/pods/monitoring_job-remover-7c46f7d89d-ssqb8_7c5fb391-88f2-471f-9a79-333534cdb41f/job-remover/747.log: file already closed"}
2023-05-25T07:43:17.678Z    debug   k8sattributesprocessor@v0.78.0/processor.go:102 evaluating pod identifier   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "value": [{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""}]}
2023-05-25T07:43:17.679Z    debug   k8sattributesprocessor@v0.78.0/processor.go:102 evaluating pod identifier   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "value": [{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""}]}
2023-05-25T07:43:17.879Z    debug   k8sattributesprocessor@v0.78.0/processor.go:102 evaluating pod identifier   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "value": [{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""}]}
2023-05-25T07:43:17.879Z    debug   k8sattributesprocessor@v0.78.0/processor.go:102 evaluating pod identifier   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "value": [{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""}]}
2023-05-25T07:43:18.080Z    debug   k8sattributesprocessor@v0.78.0/processor.go:102 evaluating pod identifier   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "value": [{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""}]}
2023-05-25T07:43:18.280Z    debug   k8sattributesprocessor@v0.78.0/processor.go:102 evaluating pod identifier   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "value": [{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""}]}
swiatekm commented 1 year ago

Thanks! That confirms my hypothesis that the problem lies in identifying the Pod for the given resource. These log lines:

debug   k8sattributesprocessor@v0.78.0/processor.go:102 evaluating pod identifier   {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "value": [{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""}]

mean that we can't find the Pod identifier.

Can you confirm the following facts for me:

  1. When you look at the logs in Coralogix, you see the following attributes:
resource.attributes.k8s.namespace.name:argowf
resource.attributes.k8s.pod.name:argo-workflow-argo-workflows-server-7cdb9788bb-dmrxc
  1. The pod_association section in your k8sprocessor looks like this:
      pod_association:
        - sources:
            - from: resource_attribute
              name: k8s.pod.name
            - from: resource_attribute
              name: k8s.namespace.name
AndriySidliarskiy commented 1 year ago

i`m using file log receiver to extract the namespace name and pod name from the file path but we can test this with k8s.deployment.name. Now configuration looks like

    k8sattributes:
      auth_type: "serviceAccount"
      passthrough: false
      filter:
        node_from_env_var: KUBE_NODE_NAME     
      extract:
        metadata:
          - k8s.pod.name
          - k8s.deployment.name
          - k8s.namespace.name
        labels:
          - tag_name: c2i.pipeline.execution
            key: c2i.pipeline.execution
            from: pod
          - tag_name: c2i.pipeline.project
            key: c2i.pipeline.project
            from: pod
        annotations:
          - tag_name: monitoring # extracts value of annotation from pods with key `annotation-one` and inserts it as a tag with key `a1`
            key: monitoring
            from: pod
AndriySidliarskiy commented 1 year ago

@swiatekm-sumo but also when i committed part of moving form attributes to resource k8s.pod.name etc i have the same errors.

    filelog:
      include: [/var/log/pods/*/*/*.log]
      include_file_name: false
      include_file_path: true
      operators:
        - type: router
          id: get-format
          routes:
            - output: parser-docker
              expr: 'body matches "^\\{"'
            - output: parser-containerd
              expr: 'body matches "^[^ Z]+Z"'
        - type: regex_parser
          id: parser-containerd
          regex: '^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<message>.*)$'
          output: extract_metadata_from_filepath
          timestamp:
            parse_from: attributes.time
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        - type: json_parser
          id: parser-docker
          output: extract_metadata_from_filepath
          timestamp:
            parse_from: attributes.time
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        - type: regex_parser
          id: extract_metadata_from_filepath
          regex: '^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]{36})\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$'
          parse_from: attributes["log.file.path"]
        # - type: move
        #   from: attributes.namespace
        #   to: resource["k8s.namespace.name"]
        # - type: move
        #   from: attributes.restart_count
        #   to: resource["k8s.pod.restart_count"]
        # - type: move
        #   from: attributes.message
        #   to: body
        # - type: move
        #   from: attributes.pod_name
        #   to: resource["k8s.pod.name"]
        # - type: move
        #   from: attributes.container_name
        #   to: resource["k8s.container.name"]
AndriySidliarskiy commented 1 year ago

and also with this configuration i have the same error

    k8sattributes:
      auth_type: "serviceAccount"
      passthrough: false
      filter:
        node_from_env_var: KUBE_NODE_NAME     
      extract:
        metadata:
          - k8s.pod.name
          - k8s.deployment.name
          - k8s.namespace.name
        labels:
          - tag_name: c2i.pipeline.execution
            key: c2i.pipeline.execution
            from: pod
          - tag_name: c2i.pipeline.project
            key: c2i.pipeline.project
            from: pod
        annotations:
          - tag_name: monitoring # extracts value of annotation from pods with key `annotation-one` and inserts it as a tag with key `a1`
            key: monitoring
            from: pod
      pod_association:
      - sources:
          - from: resource_attribute
            name: k8s.pod.name
          - from: resource_attribute
            name: k8s.namespace.name
swiatekm commented 1 year ago

That should work. At the very least k8sattributes processor should compute the right identifier. Even with the above config, you see the same logs?

AndriySidliarskiy commented 1 year ago

yes

AndriySidliarskiy commented 1 year ago

could you have time to test this on eks environment? @swiatekm-sumo

swiatekm commented 1 year ago

I don't think this has anything to do with the specific K8s distribution in play, but I will test your specific config in a KinD cluster.

AndriySidliarskiy commented 1 year ago

@swiatekm-sumo Thanks but how much time it can take to test?

swiatekm commented 1 year ago

Just to be clear, I'm not going to be committing to any timelines here, any assistance offered in this issue is on a best-effort basis.

With that said, I tested the following configurations:

  k8sattributes:
    auth_type: "serviceAccount"
    passthrough: false
    filter:
      node_from_env_var: KUBE_NODE_NAME     
    extract:
      metadata:
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.namespace.name
      labels:
        - tag_name: c2i.pipeline.execution
          key: c2i.pipeline.execution
          from: pod
        - tag_name: c2i.pipeline.project
          key: c2i.pipeline.project
          from: pod
      annotations:
        - tag_name: monitoring # extracts value of annotation from pods with key `annotation-one` and inserts it as a tag with key `a1`
          key: monitoring
          from: pod
    pod_association:
    - sources:
        - from: resource_attribute
          name: k8s.pod.name
        - from: resource_attribute
          name: k8s.namespace.name
    - id: extract-metadata-from-filepath                                                                                                                                                                                                                     
      parse_from: attributes["log.file.path"]                                                                                                                                                                                                                
      regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<run_id>\d+)\.log$                                                                                                                          
      type: regex_parser                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
    - from: attributes.container_name                                                                                                                                                                                                                        
      to: resource["k8s.container.name"]                                                                                                                                                                                                                     
      type: move                                                                                                                                                                                                                                             
    - from: attributes.namespace                                                                                                                                                                                                                             
      to: resource["k8s.namespace.name"]                                                                                                                                                                                                                     
      type: move                                                                                                                                                                                                                                             
    - from: attributes.pod_name                                                                                                                                                                                                                              
      to: resource["k8s.pod.name"]                                                                                                                                                                                                                           

and this worked as expected:

otelcol 2023-05-25T10:25:41.168Z    debug    k8sattributesprocessor@v0.77.0/processor.go:113    evaluating pod identifier    {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/containers", "value": [{"Source":{"From":"resource_attribute","
Name":"k8s.pod.name"},"Value":"collection-sumologic-otelcol-logs-collector-htbp7"},{"Source":{"From":"resource_attribute","Name":"k8s.namespace.name"},"Value":"sumologic"},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Valu
e":""}]}

So there must be something in your actual configuration that doesn't match what you've posted here.

AndriySidliarskiy commented 1 year ago

@swiatekm-sumo where you launched it? it`s like eks, aks local Kubernetes or what:?

AndriySidliarskiy commented 1 year ago

and could you please clarify how k8sattributes extracts data, Is it like he calls endpoint, or how it works?

AndriySidliarskiy commented 1 year ago

and could you please provide a full confirmation for me that you use, thanks a lot.

swiatekm commented 1 year ago

@swiatekm-sumo where you launched it? it`s like eks, aks local Kubernetes or what:?

In a local KiND cluster.

and could you please clarify how k8sattributes extracts data, Is it like he calls endpoint, or how it works?

Are you asking how it gets metadata from the K8s apiserver? It establishes a Watch for the necessary resources (mostly Pods) and maintains a local cache of them via the standard client-go mechanism of informers.

For the issue you're experiencing, the problem isn't that metadata though, it's that the processor can't tell which Pod your log records come from. That's what the logs about identifying the Pod mean.

AndriySidliarskiy commented 1 year ago

yea, i understand but it`s interesting why this processor cannot identify pod.

swiatekm commented 1 year ago

and could you please provide a full confirmation for me that you use, thanks a lot.

Here's a stripped down manifest where the Pod is identified correctly:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otelcol-logs-collector
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: otelcol-logs-collector
  template:
    metadata:
      labels:
        app.kubernetes.io/name: otelcol-logs-collector
    spec:
      securityContext:
        fsGroup: 0
        runAsGroup: 0
        runAsUser: 0
      containers:
      - args:
        - --config=/etc/otelcol/config.yaml
        image: "otel/opentelemetry-collector-contrib:0.77.0"
        name: otelcol
        volumeMounts:
        - mountPath: /etc/otelcol
          name: otelcol-config
        - mountPath: /var/log/pods
          name: varlogpods
          readOnly: true
        env:
        - name: KUBE_NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: config.yaml
            path: config.yaml
          name: otelcol-logs-collector
        name: otelcol-config
      - hostPath:
          path: /var/log/pods
          type: ""
        name: varlogpods
---
# Source: sumologic/templates/logs/collector/otelcol/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: otelcol-logs-collector
  labels:
    app: otelcol-logs-collector
data:
  config.yaml: |
    exporters:
      logging:
    processors:
      k8sattributes:
        auth_type: serviceAccount
        extract:
          annotations:
          - from: pod
            key: monitoring
            tag_name: monitoring
          labels:
          - from: pod
            key: c2i.pipeline.execution
            tag_name: c2i.pipeline.execution
          - from: pod
            key: c2i.pipeline.project
            tag_name: c2i.pipeline.project
          metadata:
          - k8s.pod.name
          - k8s.pod.uid
          - k8s.deployment.name
          - k8s.namespace.name
        filter:
          node_from_env_var: KUBE_NODE_NAME
        passthrough: false
        pod_association:
        - sources:
          - from: resource_attribute
            name: k8s.pod.name
          - from: resource_attribute
            name: k8s.namespace.name
    receivers:
      filelog/containers:
        include:
        - /var/log/pods/*/*/*.log
        include_file_name: false
        include_file_path: true
        operators:
        - id: parser-containerd
          output: merge-cri-lines
          parse_to: body
          regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*)( |)(?P<log>.*)$
          timestamp:
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
            parse_from: body.time
          type: regex_parser
        - combine_field: body.log
          combine_with: ""
          id: merge-cri-lines
          is_last_entry: body.logtag == "F"
          overwrite_with: newest
          source_identifier: attributes["log.file.path"]
          type: recombine
        - id: extract-metadata-from-filepath
          parse_from: attributes["log.file.path"]
          parse_to: attributes
          regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<run_id>\d+)\.log$
          type: regex_parser
        - from: attributes.container_name
          to: resource["k8s.container.name"]
          type: move
        - from: attributes.namespace
          to: resource["k8s.namespace.name"]
          type: move
        - from: attributes.pod_name
          to: resource["k8s.pod.name"]
          type: move
        - field: attributes.run_id
          type: remove
        - field: attributes.uid
          type: remove
        - field: attributes["log.file.path"]
          type: remove
        - from: body.log
          to: body
          type: move
    service:
      pipelines:
        logs/containers:
          exporters:
          - logging
          processors:
          - k8sattributes
          receivers:
          - filelog/containers
      telemetry:
        logs:
          level: debug

Note that k8sattributes doesn't add metadata here, as it doesn't have the required RBAC. But it does identify Pods correctly, which you can confirm in the debug logs.

AndriySidliarskiy commented 1 year ago

so i try to use another configuration and it work but now i have this error but for metrics pipeline k8sattributesprocessor@v0.78.0/processor.go:102 evaluating pod identifier {"kind": "processor", "name": "k8sattributes", "pipeline": "logs/eks", "value": [{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""},{"Source":{"From":"","Name":""},"Value":""}]}

swiatekm commented 1 year ago

@AndriySidliarskiy was your original problem fixed, then? If you have a different one, please close this issue and open a new one, with more information pertaining to the new problem with metrics.