open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.89k stars 2.27k forks source link

Issue with the processor/transform - leaking resources / attributes #34715

Closed evilr00t closed 3 weeks ago

evilr00t commented 3 weeks ago

Component(s)

processor/transform

What happened?

Description

Doing the transform some of the resources are being improperly set, using values from completely different log message.

Steps to Reproduce

I use journald logging for Docker with image tagging:

--log-driver journald --log-opt tag={{.Name}}|{{.ImageName}}

OTEL config:

  transform/logs:
    error_mode: ignore
    log_statements:
    - context: log
      statements:
      - set(severity_number, SEVERITY_NUMBER_DEBUG) where Int(body["PRIORITY"]) ==
        7
      - set(severity_number, SEVERITY_NUMBER_INFO) where Int(body["PRIORITY"]) ==
        6
      - set(severity_number, SEVERITY_NUMBER_INFO2) where Int(body["PRIORITY"]) ==
        5
      - set(severity_number, SEVERITY_NUMBER_WARN) where Int(body["PRIORITY"]) ==
        4
      - set(severity_number, SEVERITY_NUMBER_ERROR) where Int(body["PRIORITY"]) ==
        3
      - set(severity_number, SEVERITY_NUMBER_FATAL) where Int(body["PRIORITY"]) <=
        2
      - set(attributes["priority"], body["PRIORITY"])
      - set(attributes["process.comm"], body["_COMM"])
      - set(attributes["process.exec"], body["_EXE"])
      - set(attributes["process.uid"], body["_UID"])
      - set(attributes["process.gid"], body["_GID"])
      - set(attributes["owner_uid"], body["_SYSTEMD_OWNER_UID"])
      - set(attributes["unit"], body["_SYSTEMD_UNIT"])
      - set(attributes["syslog_identifier"], body["SYSLOG_IDENTIFIER"])
      - set(attributes["syslog_identifier_prefix"], ConvertCase(body["SYSLOG_IDENTIFIER"],
        "lower")) where body["SYSLOG_IDENTIFIER"] != nil
      - replace_pattern(attributes["syslog_identifier_prefix"], "^[^a-zA-Z]*([a-zA-Z]{3,25}).*",
        "$$1") where body["SYSLOG_IDENTIFIER"] != nil
      - set(attributes["unit_prefix"], ConvertCase(body["_SYSTEMD_UNIT"], "lower"))
        where body["_SYSTEMD_UNIT"] != nil
      - replace_pattern(attributes["unit_prefix"], "^[^a-zA-Z]*([a-zA-Z]{3,25}).*",
        "$$1") where body["_SYSTEMD_UNIT"] != nil
      - set(attributes["job"], attributes["syslog_identifier_prefix"])
      - set(attributes["job"], attributes["unit_prefix"]) where attributes["job"]
        == nil and attributes["unit_prefix"] != nil
      - set(resource.attributes["aws_account"],"social360")
      - set(resource.attributes["service.name"], ConvertCase(body["SYSLOG_IDENTIFIER"],
        "lower")) where body["SYSLOG_IDENTIFIER"] != nil
      - replace_pattern(resource.attributes["service.name"], "^([^-]*-[^-]*).*", "$$1")
        where body["SYSLOG_IDENTIFIER"] != nil
      - set(resource.attributes["docker.image"], ConvertCase(body["SYSLOG_IDENTIFIER"],
        "lower")) where body["SYSLOG_IDENTIFIER"] != nil
      - replace_pattern(resource.attributes["docker.image"], ".*\\|(.*)$", "$$1")
        where body["SYSLOG_IDENTIFIER"] != nil
      - set(resource.attributes["container.name"], ConvertCase(body["SYSLOG_IDENTIFIER"],
        "lower")) where body["SYSLOG_IDENTIFIER"] != nil
      - replace_pattern(resource.attributes["container.name"], "^(.*)\\|.*", "$$1")
        where body["SYSLOG_IDENTIFIER"] != nil
      - set(body, body["MESSAGE"])

Expected Result

docker_image & container name should be used from syslog_identifier but they have completely different values, replication is gone and api is used which is separate container and shouldn't be here?

Actual Result

Screenshot 2024-08-16 at 10 46 09 AM

It looks like some resources are leaked? This shouldn't happen...

Collector version

0.106.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")

root@75:/etc/otelcol-contrib# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04 LTS
Release:        24.04
Codename:       noble

OpenTelemetry Collector configuration

---
receivers:
  docker_stats:
    container_labels_to_metric_labels:
      org.opencontainers.image.source: org.opencontainers.image.source
      com.docker.compose.project: service.name
    metrics:
      container.uptime:
        enabled: true
      container.restarts:
        enabled: true
  hostmetrics:
    scrapers:
      cpu:
        metrics:
          system.cpu.logical.count:
            enabled: true
          system.cpu.utilization:
            enabled: true
      memory:
        metrics:
          system.memory.utilization:
            enabled: true
          system.memory.limit:
            enabled: true
      filesystem:
        exclude_fs_types:
          fs_types:
          - squashfs
          - vfat
          match_type: strict
        metrics:
          system.filesystem.utilization:
            enabled: true
      network: {}
      load: {}
      disk: {}
      paging: {}
  journald:
    units:
    - ssh
    - systemd
    - docker
    - containerd
    priority: info
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  prometheus:
    config:
      scrape_configs:
      - job_name: otel-collector
        scrape_interval: 10s
        static_configs:
        - targets:
          - localhost:8888
service:
  pipelines:
    metrics:
      receivers:
      - docker_stats
      - hostmetrics
      - otlp
      - prometheus
      processors:
      - batch
      - resourcedetection
      exporters:
      - prometheusremotewrite/thanos
    logs:
      receivers:
      - journald
      processors:
      - batch
      - resourcedetection
      - transform/logs
      exporters:
      - otlphttp/loki
    traces:
      receivers:
      - otlp
      processors:
      - batch
      - resourcedetection
      exporters:
      - otlp
  telemetry:
    logs: {}
    metrics:
      level: basic
      address: ":8888"
  extensions:
  - basicauth/jaeger
  - basicauth/loki
  - basicauth/thanos
processors:
  batch: {}
  resourcedetection:
    detectors:
    - system
    - env
    system:
      resource_attributes:
        host.name:
          enabled: true
    transform/logs:
    error_mode: ignore
    log_statements:
    - context: log
      statements:
      - set(severity_number, SEVERITY_NUMBER_DEBUG) where Int(body["PRIORITY"]) ==
        7
      - set(severity_number, SEVERITY_NUMBER_INFO) where Int(body["PRIORITY"]) ==
        6
      - set(severity_number, SEVERITY_NUMBER_INFO2) where Int(body["PRIORITY"]) ==
        5
      - set(severity_number, SEVERITY_NUMBER_WARN) where Int(body["PRIORITY"]) ==
        4
      - set(severity_number, SEVERITY_NUMBER_ERROR) where Int(body["PRIORITY"]) ==
        3
      - set(severity_number, SEVERITY_NUMBER_FATAL) where Int(body["PRIORITY"]) <=
        2
      - set(attributes["priority"], body["PRIORITY"])
      - set(attributes["process.comm"], body["_COMM"])
      - set(attributes["process.exec"], body["_EXE"])
      - set(attributes["process.uid"], body["_UID"])
      - set(attributes["process.gid"], body["_GID"])
      - set(attributes["owner_uid"], body["_SYSTEMD_OWNER_UID"])
      - set(attributes["unit"], body["_SYSTEMD_UNIT"])
      - set(attributes["syslog_identifier"], body["SYSLOG_IDENTIFIER"])
      - set(attributes["syslog_identifier_prefix"], ConvertCase(body["SYSLOG_IDENTIFIER"],
        "lower")) where body["SYSLOG_IDENTIFIER"] != nil
      - replace_pattern(attributes["syslog_identifier_prefix"], "^[^a-zA-Z]*([a-zA-Z]{3,25}).*",
        "$$1") where body["SYSLOG_IDENTIFIER"] != nil
      - set(attributes["unit_prefix"], ConvertCase(body["_SYSTEMD_UNIT"], "lower"))
        where body["_SYSTEMD_UNIT"] != nil
      - replace_pattern(attributes["unit_prefix"], "^[^a-zA-Z]*([a-zA-Z]{3,25}).*",
        "$$1") where body["_SYSTEMD_UNIT"] != nil
      - set(attributes["job"], attributes["syslog_identifier_prefix"])
      - set(attributes["job"], attributes["unit_prefix"]) where attributes["job"]
        == nil and attributes["unit_prefix"] != nil
      - set(resource.attributes["aws_account"],"social360")
      - set(resource.attributes["service.name"], ConvertCase(body["SYSLOG_IDENTIFIER"],
        "lower")) where body["SYSLOG_IDENTIFIER"] != nil
      - replace_pattern(resource.attributes["service.name"], "^([^-]*-[^-]*).*", "$$1")
        where body["SYSLOG_IDENTIFIER"] != nil
      - set(resource.attributes["docker.image"], ConvertCase(body["SYSLOG_IDENTIFIER"],
        "lower")) where body["SYSLOG_IDENTIFIER"] != nil
      - replace_pattern(resource.attributes["docker.image"], ".*\\|(.*)$", "$$1")
        where body["SYSLOG_IDENTIFIER"] != nil
      - set(resource.attributes["container.name"], ConvertCase(body["SYSLOG_IDENTIFIER"],
        "lower")) where body["SYSLOG_IDENTIFIER"] != nil
      - replace_pattern(resource.attributes["container.name"], "^(.*)\\|.*", "$$1")
        where body["SYSLOG_IDENTIFIER"] != nil
      - set(body, body["MESSAGE"])
exporters:
  otlp:
    endpoint: FOOBAR
    headers:
      Content-Type: application/grpc
    auth:
      authenticator: basicauth/jaeger
  otlphttp/loki:
    endpoint: FOOBAR
    auth:
      authenticator: basicauth/loki
  prometheusremotewrite/thanos:
    endpoint: FOOBAR
    auth:
      authenticator: basicauth/thanos
    target_info:
      enabled: false
    add_metric_suffixes: false
    resource_to_telemetry_conversion:
      enabled: true
    external_labels:
      social360: 'true'
extensions:
  basicauth/jaeger:
    client_auth:
  basicauth/loki:
    client_auth:
  basicauth/thanos:
    client_auth:

Log output

No response

Additional context

No response

github-actions[bot] commented 3 weeks ago

Pinging code owners:

TylerHelmuth commented 3 weeks ago

@evilr00t this is likely a duplicate of https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/32080. Can you try enabling the transform.flatten.logs feature gate and then setting flatten_data: true in the transformprocessor config?

Details here: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor#transformflattenlogs

evilr00t commented 3 weeks ago

@TylerHelmuth Thank you for the info, I've enabled feature flag and I'm checking logs now, will let you know soon if that helped.

EDIT: logs are consistent now, thank you once again @TylerHelmuth !

P.S. I was looking for similar issues but didn't know problem was with set() - thought I'd report it as the processor/transform 👍