open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.02k stars 2.33k forks source link

otel collector panics in specific `exporters: prometheus" endpont #16455

Closed Tsovak closed 1 year ago

Tsovak commented 1 year ago

Component(s)

exporter/prometheus

What happened?

Description

otel collector panics in specific `exporters: prometheus" endpont

Steps to Reproduce

docker compose

  version: "3.7"
services:

  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: [ "--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "1888:1888"   # pprof extension
      - "8888:8888"   # Prometheus metrics exposed by the collector
      - "8889:8889"   # Prometheus exporter metrics
      - "13133:13133" # health_check extension
      - "4317:4317"   # OTLP gRPC receiver
      - "55679" # zpages extension
      - "9464:9464"
    depends_on:
      - jaeger-all-in-one
      - prometheus
    networks: [ otel ]

  jaeger-all-in-one:
    image: jaegertracing/all-in-one:latest
    environment:
      - COLLECTOR_OTLP_ENABLED=true
    ports:
      - "16686:16686"
      - "14268"
      - "14250:14250"
      - "6831:6831"
    #      - "4317:4317"   # OTLP gRPC receiver
    networks: [ otel ]

  grafana:
    image: grafana/grafana:latest
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
      - GF_AUTH_DISABLE_LOGIN_FORM=true
    ports:
      - "3000:3000"
    networks: [ otel ]

  # Prometheus
  prometheus:
    container_name: prometheus
    image: prom/prometheus:latest
    volumes:
      - ./prometheus-config.yaml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    networks: [ otel ]

networks:
  otel:

docker compose up

prometheus-config.yaml

scrape_configs:
  - job_name: 'otel-collector'
    scrape_interval: 10s
    static_configs:
      - targets: [ 'otel-collector:8889' ]
      - targets: [ 'otel-collector:8888' ]

  - job_name: otel
    static_configs:
      - targets:
          - 'otel-collector:9464'

image details

✗ docker pull otel/opentelemetry-collector-contrib:latest
latest: Pulling from otel/opentelemetry-collector-contrib
Digest: sha256:cc130d2f52444a67f5b5942dd840bc507dfd63714cb6bf0f1b706fd81b01b341
Status: Image is up to date for otel/opentelemetry-collector-contrib:latest
docker.io/otel/opentelemetry-collector-contrib:latest

and

✗ docker inspect otel/opentelemetry-collector-contrib:latest
[
    {
        "Id": "sha256:d31416e658b67007f0c92ae35aebf835e383e0abe179df082c882867e57e7daf",
        "RepoTags": [
            "otel/opentelemetry-collector-contrib:latest"
        ],
        "RepoDigests": [
            "otel/opentelemetry-collector-contrib@sha256:cc130d2f52444a67f5b5942dd840bc507dfd63714cb6bf0f1b706fd81b01b341"
        ],
        "Parent": "",
        "Comment": "buildkit.dockerfile.v0",
        "Created": "2022-11-10T19:32:01.249751666Z",
        "Container": "",
        "ContainerConfig": {
            "Hostname": "",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": null,
            "Cmd": null,
            "Image": "",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": null
        },
        "DockerVersion": "",
        "Author": "",
        "Config": {
            "Hostname": "",
            "Domainname": "",
            "User": "10001",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "4317/tcp": {},
                "55678/tcp": {},
                "55679/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "--config",
                "/etc/otelcol-contrib/config.yaml"
            ],
            "ArgsEscaped": true,
            "Image": "",
            "Volumes": null,
            "WorkingDir": "/",
            "Entrypoint": [
                "/otelcol-contrib"
            ],
            "OnBuild": null,
            "Labels": {
                "org.opencontainers.image.created": "2022-11-10T18:27:14Z",
                "org.opencontainers.image.name": "opentelemetry-collector-releases",
                "org.opencontainers.image.revision": "198d380138e3c73a6968f004dc66b88ad3e35a81",
                "org.opencontainers.image.source": "https://github.com/open-telemetry/opentelemetry-collector-releases",
                "org.opencontainers.image.version": "0.64.1"
            }
        },
        "Architecture": "arm64",
        "Os": "linux",
        "Size": 177293345,
        "VirtualSize": 177293345,
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/478f07795298a9c4c6a6d39fd6387aac661e167d71e4e0013b73ebd541f433cc/diff:/var/lib/docker/overlay2/3508639ad5c2083832c7c39d3aa5e5d432083368b24af56c158bb586cac3bf78/diff",
                "MergedDir": "/var/lib/docker/overlay2/17cfa63012b44bb3321cd7fb48672131fdfb538b2af6ae87b4987a154ffb4903/merged",
                "UpperDir": "/var/lib/docker/overlay2/17cfa63012b44bb3321cd7fb48672131fdfb538b2af6ae87b4987a154ffb4903/diff",
                "WorkDir": "/var/lib/docker/overlay2/17cfa63012b44bb3321cd7fb48672131fdfb538b2af6ae87b4987a154ffb4903/work"
            },
            "Name": "overlay2"
        },
        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:f4e8db23f04e5929a3f801e0e95a63de9d5a33d58a016962643b612b9dc105d0",
                "sha256:9ccaea5d1109b670913be0ac585f8f855af0604026a95172b41c8dec82ee34b1",
                "sha256:18dd31e67cf038699df03abe524e9b1c8d602a7905690ab4f6f5b258eb7a603d"
            ]
        },
        "Metadata": {
            "LastTagTime": "0001-01-01T00:00:00Z"
        }
    }
]

Expected Result

log error message and exit

Actual Result

the application panics without error message

Collector version

sha256:cc130d2f52444a67f5b5942dd840bc507dfd63714cb6bf0f1b706fd81b01b341

Environment information

Environment

OS: macIS Monterey 12.4 (21F79) MacBook Pro (16-inch, 2021) Apple M1 Pro

✗ docker info Client: Context: default Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc., v0.9.1) compose: Docker Compose (Docker Inc., v2.12.1) dev: Docker Dev Environments (Docker Inc., v0.0.3) extension: Manages Docker extensions (Docker Inc., v0.2.13) sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc., 0.6.0) scan: Docker Scan (Docker Inc., v0.21.0)

Server: Containers: 10 Running: 0 Paused: 0 Stopped: 10 Images: 74 Server Version: 20.10.20 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc Default Runtime: runc Init Binary: docker-init containerd version: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6 runc version: v1.1.4-0-g5fd4c4d init version: de40ad0 Security Options: seccomp Profile: default cgroupns Kernel Version: 5.15.49-linuxkit Operating System: Docker Desktop OSType: linux Architecture: aarch64 CPUs: 4 Total Memory: 3.84GiB Name: docker-desktop ID: TSDV:SDM5:OOTG:5HPG:QV5I:QDN3:7XKA:5MU2:XEDM:6STP:EN5A:DMUA Docker Root Dir: /var/lib/docker Debug Mode: true File Descriptors: 46 Goroutines: 48 System Time: 2022-11-23T14:33:14.643109466Z EventsListeners: 3 HTTP Proxy: http.docker.internal:3128 HTTPS Proxy: http.docker.internal:3128 No Proxy: hubproxy.docker.internal Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: iotlab.skoltech.ru:8926 10.16.68.26:8926 192.168.15.59:5000 docker-registry.iot.10.30.16.181.xip.io hubproxy.docker.internal:5000 iotlab.skoltech.ru:5000 127.0.0.0/8 Live Restore Enabled: false

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:

  prometheus:
    config:
      scrape_configs:
        - job_name: "otel-collector"
          scrape_interval: 2s
          honor_labels: true
          static_configs:
            - targets: [ 'otel-collector:8888' ]

exporters:
  prometheus:
    endpoint: "prometheus:9090"
    namespace: "default"

  logging:

  jaeger:
    endpoint: jaeger-all-in-one:14250
    tls:
      insecure: true

processors:
  batch:

extensions:
  health_check:

service:
  extensions: [ health_check ]
  pipelines:
    traces:
      receivers: [ otlp ]
      processors: [ batch ]
      exporters: [ logging, jaeger ]
    metrics:
      receivers: [ otlp, prometheus ]
      processors: [ batch ]
      exporters: [ prometheus]

  telemetry:
    logs:
      level: "debug"

Log output

2022-11-23T14:22:10.027Z    info    service/telemetry.go:110    Setting up own telemetry...
2022-11-23T14:22:10.028Z    info    service/telemetry.go:140    Serving Prometheus metrics  {"address": ":8888", "level": "basic"}
2022-11-23T14:22:10.028Z    info    components/components.go:30 In development component. May change in the future. {"kind": "exporter", "data_type": "traces", "name": "logging", "stability": "in development"}
2022-11-23T14:22:10.028Z    debug   components/components.go:28 Beta component. May change in the future.   {"kind": "exporter", "data_type": "traces", "name": "jaeger", "stability": "beta"}
2022-11-23T14:22:10.028Z    debug   components/components.go:28 Stable component.   {"kind": "processor", "name": "batch", "pipeline": "traces", "stability": "stable"}
2022-11-23T14:22:10.028Z    debug   components/components.go:28 Beta component. May change in the future.   {"kind": "exporter", "data_type": "metrics", "name": "prometheus", "stability": "beta"}
2022-11-23T14:22:10.028Z    debug   components/components.go:28 Stable component.   {"kind": "processor", "name": "batch", "pipeline": "metrics", "stability": "stable"}
2022-11-23T14:22:10.028Z    debug   components/components.go:28 Stable component.   {"kind": "receiver", "name": "otlp", "pipeline": "traces", "stability": "stable"}
2022-11-23T14:22:10.028Z    debug   components/components.go:28 Stable component.   {"kind": "receiver", "name": "otlp", "pipeline": "metrics", "stability": "stable"}
2022-11-23T14:22:10.028Z    debug   components/components.go:28 Beta component. May change in the future.   {"kind": "receiver", "name": "prometheus", "pipeline": "metrics", "stability": "beta"}
2022-11-23T14:22:10.028Z    info    service/service.go:89   Starting otelcol-contrib... {"Version": "0.64.1", "NumCPU": 4}
2022-11-23T14:22:10.028Z    info    extensions/extensions.go:41 Starting extensions...
2022-11-23T14:22:10.028Z    info    extensions/extensions.go:44 Extension is starting...    {"kind": "extension", "name": "health_check"}
2022-11-23T14:22:10.028Z    info    healthcheckextension@v0.64.0/healthcheckextension.go:44 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"ExtensionConfig":null,"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"Path":"/","CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2022-11-23T14:22:10.029Z    warn    internal/warning.go:51  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    {"kind": "extension", "name": "health_check", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2022-11-23T14:22:10.029Z    info    extensions/extensions.go:48 Extension started.  {"kind": "extension", "name": "health_check"}
2022-11-23T14:22:10.029Z    info    pipelines/pipelines.go:74   Starting exporters...
2022-11-23T14:22:10.029Z    info    pipelines/pipelines.go:78   Exporter is starting... {"kind": "exporter", "data_type": "traces", "name": "logging"}
2022-11-23T14:22:10.029Z    info    pipelines/pipelines.go:82   Exporter started.   {"kind": "exporter", "data_type": "traces", "name": "logging"}
2022-11-23T14:22:10.029Z    info    pipelines/pipelines.go:78   Exporter is starting... {"kind": "exporter", "data_type": "traces", "name": "jaeger"}
2022-11-23T14:22:10.029Z    info    zapgrpc/zapgrpc.go:174  [core] [Channel #1] Channel created {"grpc_log": true}
2022-11-23T14:22:10.029Z    info    zapgrpc/zapgrpc.go:174  [core] [Channel #1] original dial target is: "jaeger-all-in-one:14250"  {"grpc_log": true}
2022-11-23T14:22:10.029Z    info    zapgrpc/zapgrpc.go:174  [core] [Channel #1] parsed dial target is: {Scheme:jaeger-all-in-one Authority: Endpoint:14250 URL:{Scheme:jaeger-all-in-one Opaque:14250 User: Host: Path: RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}} {"grpc_log": true}
2022-11-23T14:22:10.029Z    info    zapgrpc/zapgrpc.go:174  [core] [Channel #1] fallback to scheme "passthrough"    {"grpc_log": true}
2022-11-23T14:22:10.029Z    info    zapgrpc/zapgrpc.go:174  [core] [Channel #1] parsed dial target is: {Scheme:passthrough Authority: Endpoint:jaeger-all-in-one:14250 URL:{Scheme:passthrough Opaque: User: Host: Path:/jaeger-all-in-one:14250 RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}    {"grpc_log": true}
2022-11-23T14:22:10.029Z    info    zapgrpc/zapgrpc.go:174  [core] [Channel #1] Channel authority set to "jaeger-all-in-one:14250"  {"grpc_log": true}
2022-11-23T14:22:10.029Z    info    zapgrpc/zapgrpc.go:174  [core] [Channel #1] Resolver state updated: {
  "Addresses": [
    {
      "Addr": "jaeger-all-in-one:14250",
      "ServerName": "",
      "Attributes": null,
      "BalancerAttributes": null,
      "Type": 0,
      "Metadata": null
    }
  ],
  "ServiceConfig": null,
  "Attributes": null
} (resolver returned new addresses) {"grpc_log": true}
2022-11-23T14:22:10.029Z    info    zapgrpc/zapgrpc.go:174  [core] [Channel #1] Channel switches to new LB policy "pick_first"  {"grpc_log": true}
2022-11-23T14:22:10.029Z    info    zapgrpc/zapgrpc.go:174  [core] [Channel #1 SubChannel #2] Subchannel created    {"grpc_log": true}
2022-11-23T14:22:10.029Z    info    zapgrpc/zapgrpc.go:174  [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING  {"grpc_log": true}
2022-11-23T14:22:10.030Z    info    zapgrpc/zapgrpc.go:174  [core] [Channel #1 SubChannel #2] Subchannel picks a new address "jaeger-all-in-one:14250" to connect   {"grpc_log": true}
2022-11-23T14:22:10.030Z    info    pipelines/pipelines.go:82   Exporter started.   {"kind": "exporter", "data_type": "traces", "name": "jaeger"}
2022-11-23T14:22:10.030Z    info    pipelines/pipelines.go:78   Exporter is starting... {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
2022-11-23T14:22:10.029Z    info    jaegerexporter@v0.64.0/exporter.go:180  State of the connection with the Jaeger Collector backend   {"kind": "exporter", "data_type": "traces", "name": "jaeger", "state": "IDLE"}
2022-11-23T14:22:10.030Z    info    zapgrpc/zapgrpc.go:174  [core] pickfirstBalancer: UpdateSubConnState: 0x4000de6440, {CONNECTING <nil>}  {"grpc_log": true}
2022-11-23T14:22:10.030Z    info    zapgrpc/zapgrpc.go:174  [core] [Channel #1] Channel Connectivity change to CONNECTING   {"grpc_log": true}
2022-11-23T14:22:10.033Z    info    zapgrpc/zapgrpc.go:174  [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to READY   {"grpc_log": true}
2022-11-23T14:22:10.033Z    info    zapgrpc/zapgrpc.go:174  [core] pickfirstBalancer: UpdateSubConnState: 0x4000de6440, {READY <nil>}   {"grpc_log": true}
2022-11-23T14:22:10.033Z    info    zapgrpc/zapgrpc.go:174  [core] [Channel #1] Channel Connectivity change to READY    {"grpc_log": true}
2022-11-23T14:22:10.067Z    info    service/service.go:115  Starting shutdown...
2022-11-23T14:22:10.067Z    info    healthcheck/handler.go:129  Health Check state change   {"kind": "extension", "name": "health_check", "status": "unavailable"}
2022-11-23T14:22:10.067Z    info    pipelines/pipelines.go:118  Stopping receivers...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x3dd15f0]

goroutine 1 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver.(*pReceiver).Shutdown(0x4000b3a630, {0x0, 0x0})
    github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.64.0/metrics_receiver.go:308 +0x20
go.opentelemetry.io/collector/service/internal/pipelines.(*Pipelines).ShutdownAll(0x4000c8d220, {0x6ac31a0, 0x40000b6018})
    go.opentelemetry.io/collector@v0.64.1/service/internal/pipelines/pipelines.go:121 +0x38c
go.opentelemetry.io/collector/service.(*service).Shutdown(0x4000582e00, {0x6ac31a0, 0x40000b6018})
    go.opentelemetry.io/collector@v0.64.1/service/service.go:121 +0xd0
go.opentelemetry.io/collector/service.(*Collector).shutdownServiceAndTelemetry(0x400104fa70, {0x6ac31a0?, 0x40000b6018?})
    go.opentelemetry.io/collector@v0.64.1/service/collector.go:234 +0x30
go.opentelemetry.io/collector/service.(*Collector).setupConfigurationComponents(0x400104fa70, {0x6ac31a0, 0x40000b6018})
    go.opentelemetry.io/collector@v0.64.1/service/collector.go:155 +0x1fc
go.opentelemetry.io/collector/service.(*Collector).Run(0x400104fa70, {0x6ac31a0, 0x40000b6018})
    go.opentelemetry.io/collector@v0.64.1/service/collector.go:164 +0x30
go.opentelemetry.io/collector/service.NewCommand.func1(0x400077f500, {0x5e1f2b5?, 0x2?, 0x2?})
    go.opentelemetry.io/collector@v0.64.1/service/command.go:53 +0x3b8
github.com/spf13/cobra.(*Command).execute(0x400077f500, {0x40000b01f0, 0x2, 0x2})
    github.com/spf13/cobra@v1.6.1/command.go:916 +0x5e0
github.com/spf13/cobra.(*Command).ExecuteC(0x400077f500)
    github.com/spf13/cobra@v1.6.1/command.go:1044 +0x368
github.com/spf13/cobra.(*Command).Execute(...)
    github.com/spf13/cobra@v1.6.1/command.go:968
main.runInteractive({{0x4000a4baa0, 0x4000a7ac90, 0x4000a4bec0, 0x4000a4b740}, {{0x5e4795e, 0xf}, {0x5ec343b, 0x1f}, {0x5e18d58, 0x6}}, ...})
    github.com/open-telemetry/opentelemetry-collector-releases/contrib/main.go:32 +0x40
main.run(...)
    github.com/open-telemetry/opentelemetry-collector-releases/contrib/main_others.go:11
main.main()
    github.com/open-telemetry/opentelemetry-collector-releases/contrib/main.go:25 +0x124

Additional context

No response

github-actions[bot] commented 1 year ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Aneurysm9 commented 1 year ago

This is in the Prometheus receiver, not the exporter, as evidenced by the top of the included stack trace. This will be fixed by https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/16211.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x3dd15f0]

goroutine 1 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver.(*pReceiver).Shutdown(0x4000b3a630, {0x0, 0x0})
    github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.64.0/metrics_receiver.go:308 +0x20
github-actions[bot] commented 1 year ago

Pinging code owners: @Aneurysm9 @dashpole. See Adding Labels via Comments if you do not have permissions to add labels yourself.

patil-kshitij commented 1 year ago

@Tsovak

Can you please post prometheus config yaml you had used ?

Tsovak commented 1 year ago

@Tsovak

Can you please post prometheus config yaml you had used ?

I added to the issue body prometheus-config.yaml

Aneurysm9 commented 1 year ago

This has been fixed via https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/16470.