open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.9k stars 2.27k forks source link

podman endpoints cfg issue #34522

Open cforce opened 1 month ago

cforce commented 1 month ago

Component(s)

receiver/podman

What happened?

Description

The endpoint used differs from the one configured. An exrta "/" is injected for unknown reasons

"dial unix /run//podman/podman.sock: "

but configured is

"endpoint: unix://run/podman/podman.sock"

Maybe docs are buggy and we ned to use escape for ":" or /"?

see https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/podmanreceiver

Collector version

0.106.1

Environment information

Environment

OpenTelemetry Collector configuration

extensions:
  zpages:
    endpoint: "127.0.0.1:55679"

  health_check:
    endpoint: "127.0.0.1:8081"

  pprof:
    endpoint: "127.0.0.1:1777"
    block_profile_fraction: 3
    mutex_profile_fraction: 5

receivers:
  prometheus/otelcol:
    config:
      scrape_configs:
        - job_name: 'otelcol'
          scrape_interval: 10s
          static_configs:
            - targets: ['localhost:8888']
  podman_stats:
    endpoint: unix://run/podman/podman.sock
    timeout: 10s
    collection_interval: 30s    
  hostmetrics:
    collection_interval: 30s
    normalizeProcessCPUUtilization: true
    scrapers:
      cpu:
        metrics:
          system.cpu.frequency:
            enabled: true
          system.cpu.logical.count:
            enabled: true
          system.cpu.physical.count:
            enabled: true
          system.cpu.utilization:
            enabled: true
      load:
      paging:
        metrics:
          system.paging.utilization:
            enabled: true
      filesystem:
        metrics:
          system.filesystem.utilization:
            enabled: true
      network:
        metrics:
          system.network.conntrack.count:
            enabled: true
          system.network.conntrack.max:
              enabled: true
      memory:
        metrics:
          system.linux.memory.available:
            enabled: true
          system.memory.limit:
            enabled: true
          system.memory.utilization:
            enabled: true
      processes:
      process:
        metrics:
          process.threads:
            enabled: true
          process.signals_pending:
            enabled: true
          process.paging.faults:
            enabled: true
          process.memory.utilization:
            enabled: true
          process.open_file_descriptors:
            enabled: true
          process.handles:
            enabled: true
          process.disk.operations:
            enabled: true
          process.context_switches:
            enabled: true  
          process.cpu.utilization:
            enabled: true
        mute_process_name_error: true
        mute_process_exe_error: true
        mute_process_io_error: true
        mute_process_user_error: true
        mute_process_cgroup_error: true
    resource_attributes:
      process.cgroup: true
  hostmetrics/disk:
    collection_interval: 3m
    scrapers:
      disk: 
  otlp:
    protocols:
      grpc:
        endpoint: "${env:HOST_IP}:4317"
        #endpoint: "127.0.0.1:4317"

processors:
  resourcedetection/env:
    detectors: [env, system]
    timeout: 15s
    override: true
  batch:
    # Datadog APM Intake limit is 3.2MB. Let's make sure the batches do not go over that.
    send_batch_max_size: 8192 # (default = 8192): Maximum batch size of spans to be sent to the backend. The default value is 8192 spans.
    send_batch_size: 512 # (default = 512): Maximum number of spans to process in a batch. The default value is 512 spans.
    timeout: 10s # (default = 5s): Maximum time to wait until the batch is sent. The default value is 5s.
  memory_limiter:
    check_interval: 5s
    limit_mib: 150
  attributes:
    actions:
      - key: tags
        value:
          - 'env:dev'
        action: upsert
  resource:
    attributes:
      - key: env
        value: 'dev'
        action: insert
      - key: geo
        action: insert
      - key: region
        action: insert
exporters:
  # logging:
  #   verbosity: detailed
  otlphttp:
    endpoint: http://127.0.0.1:9081/otlp-http

service:
  telemetry:
    metrics:
      address: 'localhost:8888'
    logs:
      level: 'info'
    traces:
      propagators:
        - "b3"
        - "tracecontext"
  extensions: [zpages, health_check, pprof]
  pipelines:
    metrics:
      receivers: [otlp, podman_stats, prometheus/otelcol]
      processors: [memory_limiter, batch, attributes, resource, resourcedetection/env]
      exporters: [otlphttp]
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, attributes, resource]
      exporters: [otlphttp]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch, attributes, resource]
      exporters: [otlphttp]

Log output

2024-08-08T14:11:28.915Z    info    service@v0.106.1/service.go:117 Setting up own telemetry...
2024-08-08T14:11:28.916Z    info    service@v0.106.1/service.go:120 OpenCensus bridge is disabled for Collector telemetry and will be removed in a future version, use --feature-gates=-service.disableOpenCensusBridge to re-enable
2024-08-08T14:11:28.916Z    info    service@v0.106.1/telemetry.go:96    Serving metrics {"address": "localhost:8888", "metrics level": "Normal"}
2024-08-08T14:11:28.917Z    info    memorylimiter/memorylimiter.go:75   Memory limiter configured   {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "limit_mib": 150, "spike_limit_mib": 30, "check_interval": 5}
2024-08-08T14:11:28.920Z    info    service@v0.106.1/service.go:199 Starting micotelcollector...    {"Version": "0.106.1", "NumCPU": 2}
2024-08-08T14:11:28.920Z    info    extensions/extensions.go:36 Starting extensions...
2024-08-08T14:11:28.920Z    info    extensions/extensions.go:39 Extension is starting...    {"kind": "extension", "name": "health_check"}
2024-08-08T14:11:28.920Z    info    healthcheckextension@v0.106.1/healthcheckextension.go:32    Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"127.0.0.1:8081","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-08-08T14:11:28.921Z    info    extensions/extensions.go:56 Extension started.  {"kind": "extension", "name": "health_check"}
2024-08-08T14:11:28.921Z    info    extensions/extensions.go:39 Extension is starting...    {"kind": "extension", "name": "zpages"}
2024-08-08T14:11:28.921Z    info    zpagesextension@v0.106.1/zpagesextension.go:54  Registered zPages span processor on tracer provider {"kind": "extension", "name": "zpages"}
2024-08-08T14:11:28.921Z    info    zpagesextension@v0.106.1/zpagesextension.go:64  Registered Host's zPages    {"kind": "extension", "name": "zpages"}
2024-08-08T14:11:28.921Z    info    zpagesextension@v0.106.1/zpagesextension.go:76  Starting zPages extension   {"kind": "extension", "name": "zpages", "config": {"Endpoint":"127.0.0.1:55679","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0}}
2024-08-08T14:11:28.921Z    info    extensions/extensions.go:56 Extension started.  {"kind": "extension", "name": "zpages"}
2024-08-08T14:11:28.921Z    info    extensions/extensions.go:39 Extension is starting...    {"kind": "extension", "name": "pprof"}
2024-08-08T14:11:28.921Z    info    pprofextension@v0.106.1/pprofextension.go:60    Starting net/http/pprof server  {"kind": "extension", "name": "pprof", "config": {"TCPAddr":{"Endpoint":"127.0.0.1:1777","DialerConfig":{"Timeout":0}},"BlockProfileFraction":3,"MutexProfileFraction":5,"SaveToFile":""}}
2024-08-08T14:11:28.921Z    info    extensions/extensions.go:56 Extension started.  {"kind": "extension", "name": "pprof"}
2024-08-08T14:11:28.923Z    info    internal/resourcedetection.go:125   began detecting resource information    {"kind": "processor", "name": "resourcedetection/env", "pipeline": "metrics"}
2024-08-08T14:11:28.923Z    info    internal/resourcedetection.go:139   detected resource information   {"kind": "processor", "name": "resourcedetection/env", "pipeline": "metrics", "resource": {"host.name":"runner-ykxhnyexq-project-45956638-concurrent-0","os.type":"linux"}}
2024-08-08T14:11:28.923Z    info    otlpreceiver@v0.106.1/otlp.go:102   Starting GRPC server    {"kind": "receiver", "name": "otlp/richos", "data_type": "traces", "endpoint": "localhost:4317"}
2024-08-08T14:11:28.924Z    info    prometheusreceiver@v0.106.1/metrics_receiver.go:307 Starting discovery manager  {"kind": "receiver", "name": "prometheus/otelcol", "data_type": "metrics"}
2024-08-08T14:11:28.925Z    info    prometheusreceiver@v0.106.1/metrics_receiver.go:285 Scrape job added    {"kind": "receiver", "name": "prometheus/otelcol", "data_type": "metrics", "jobName": "otelcol"}
2024-08-08T14:11:28.925Z    info    prometheusreceiver@v0.106.1/metrics_receiver.go:376 Starting scrape manager {"kind": "receiver", "name": "prometheus/otelcol", "data_type": "metrics"}
2024-08-08T14:11:28.925Z    error   graph/graph.go:432  Failed to start component   {"error": "Get \"http://d/v3.3.1/libpod/containers/json?filters=%7B%22status%22%3A%5B%22running%22%5D%7D\": dial unix /run//podman/podman.sock: connect: no such file or directory", "type": "Receiver", "id": "podman_stats"}
2024-08-08T14:11:28.926Z    info    service@v0.106.1/service.go:262 Starting shutdown...
2024-08-08T14:11:28.926Z    info    healthcheck/handler.go:132  Health Check state change   {"kind": "extension", "name": "health_check", "status": "unavailable"}
2024-08-08T14:11:28.927Z    info    extensions/extensions.go:63 Stopping extensions...
2024-08-08T14:11:28.927Z    info    zpagesextension@v0.106.1/zpagesextension.go:105 Unregistered zPages span processor on tracer provider   {"kind": "extension", "name": "zpages"}
2024-08-08T14:11:28.927Z    info    service@v0.106.1/service.go:276 Shutdown complete.
Error: cannot start pipelines: Get "http://d/v3.3.1/libpod/containers/json?filters=%7B%22status%22%3A%5B%22running%22%5D%7D": dial unix /run//podman/podman.sock: connect: no such file or directory
2024/08/08 14:11:28 collector server run finished with error: cannot start pipelines: Get "http://d/v3.3.1/libpod/containers/json?filters=%7B%22status%22%3A%5B%22running%22%5D%7D": dial unix /run//podman/podman.sock: connect: no such file or directory

Additional context

No response

github-actions[bot] commented 1 month ago

Pinging code owners:

rogercoll commented 1 month ago

Thanks for raising this.

The connection strategy was mostly copy/paste from https://github.com/containers/podman/blob/main/pkg/bindings/connection.go#L90. And you are correct, the implementation is adding an extra "/" for non-absolute paths unix sockets. I reckon that both forms should work in most situations, but relying on unix:/// is safer when specifying absolute paths to avoid any possible misinterpretation by the system or application.

Does your socket exist in /run/podman/podman.sock?

cforce commented 1 month ago

Yes, looks all good for me - I have added below debug logs in same shell where otelcol is executed later

echo "podman info:"
podman info

echo "Debugging info"
echo "XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR"
ls -al  $XDG_RUNTIME_DIR/*.sock || true
echo "/run/user/podman/"
ls -al  /run/user/podman/*.sock || true

and output

podman info:
host:
  arch: amd64
  buildahVersion: 1.28.2
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: conmon_2.1.6+ds1-1_amd64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.6, commit: unknown'
  cpuUtilization:
    idlePercent: 62.46
    systemPercent: 14.2
    userPercent: 23.34
  cpus: 2
  distribution:
    codename: bookworm
    distribution: debian
    version: "12"
  eventLogger: file
  hostname: runner-jlguopmm-project-45956638-concurrent-0
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.15.154+
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 5200777216
  memTotal: 8341037056
  networkBackend: cni
  ociRuntime:
    name: crun
    package: crun_1.8.1-1+deb12u1_amd64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.1
      commit: f8a096be060b22ccd3d5f3ebe44108517fbf6c30
      rundir: /run/user/podman/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: unix:///run/user/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: true
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 2147479552
  swapTotal: 2147479552
  uptime: 0h 2m 0.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries: {}
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 272[265](https://gitlab.com/Mercedes-Intelligent-Cloud/mic-monlog/micotelcollector/-/jobs/7538892225#L265)42080
  graphRootUsed: 9394491392
  graphStatus:
    Backing Filesystem: overlayfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /builds/Mercedes-Intelligent-Cloud/mic-monlog/micotelcollector
  imageStore:
    number: 0
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.3.1
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.19.8
  Os: linux
  OsArch: linux/amd64
  Version: 4.3.1
Debugging info
XDG_RUNTIME_DIR=/run/user/podman
srw------- 1 root root 0 Aug  8 17:11 /run/user/podman/podman.sock
/run/user/podman/
srw------- 1 root root 0 Aug  8 17:11 /run/user/podman/podman.sock
/run/user/root/

Ok --Path was wrong.

rogercoll commented 1 month ago

Ok --Path was wrong.

Great we found out the root problem, do you think we can close the issue?

cforce commented 1 month ago

I still think the message and the wrong path composed because of some strange fallback default is completely misleading and shall be improved

rogercoll commented 1 month ago

Although the receiver is just propagating the error retrieved from the libpod package, I agree that the error message could be improved. Regarding the path fallback strategy, I would prefer to rely on what the containers/podman package does.

@cforce Would you be interested in opening a PR to improve the error message?