open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.02k stars 2.33k forks source link

panic: runtime error: invalid memory address or nil pointer dereference ON github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver/internal.(*MetricsAdjusterPdata).adjustMetricHistogram #7387

Closed templarfelix closed 2 years ago

templarfelix commented 2 years ago

Describe the bug on run with many prometheus scrabes cause memory leaks

Steps to reproduce run on k8s with many prometheus scrabes.

What version did you use? Version: v0.42.0

What config did you use? Config:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: opentelemetry
  namespace: opentelemetry
spec:
  mode: daemonset
  image: otel/opentelemetry-collector-contrib:0.42.0
  imagePullPolicy: IfNotPresent
  upgradeStrategy: none

  serviceAccount: opentelemetry

  #resources:
  #  limits:
  #    memory: '2Gi'
  #    cpu: '1'
  #  requests:
  #    memory: '1Gi'
  #    cpu: '500m'

  securityContext:
    runAsUser: 0
    runAsGroup: 0

  volumes:
    - name: varlog
      hostPath:
        path: /var/log
    - name: varlibdockercontainers
      hostPath:
        path: /var/lib/docker/containers

  volumeMounts:
    - mountPath: /var/log
      name: varlog
      readOnly: true
    - mountPath: /var/lib/docker/containers
      name: varlibdockercontainers
      readOnly: true

  envFrom:
    - secretRef:
        name: opentelemetry-secrets
    - configMapRef:
        name: opentelemetry-configs

  env:
    - name: POD_IP
      valueFrom:
        fieldRef:
          fieldPath: status.podIP
    - name: OTEL_RESOURCE_ATTRIBUTES
      value: "k8s.pod.ip=$(POD_IP)"

  config: |

    receivers:
      zipkin:
      opencensus:
      otlp:
        protocols:
          grpc:
          http:
      filelog:
        include:
          - /var/log/pods/*/*/*.log
        exclude:
          - /var/log/pods/*/otel-collector/*.log
        start_at: beginning
        include_file_path: true
        include_file_name: true
        operators:    
          - type: json_parser
            timestamp:
              parse_from: time
              layout: '%Y-%m-%dT%H:%M:%S.%LZ'
          - id: filename
            resource:
              service.name: EXPR($$attributes["file.path"])
            type: metadata
          - id: extract_metadata_from_filepath
            parse_from: $$attributes["file.path"]
            regex: '^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]{36})\/(?P<container_name>[^\._]+)\/(?P<run_id>\d+)\.log$'
            type: regex_parser
          - resource:
              stream: 'EXPR($.stream)'
              container_name: 'EXPR($.container_name)'
              namespace: 'EXPR($.namespace)'
              pod_name: 'EXPR($.pod_name)'
              run_id: 'EXPR($.run_id)'
              uid: 'EXPR($.uid)'
            type: metadata

          # Clean up log body
          - type: restructure
            id: clean-up-log-body
            ops:
              - move:
                  from: log
                  to: $
      prometheus:
        buffer_period: 30
        buffer_count: 500
        use_start_time_metric: true
        config:
          global:
            scrape_interval: 1m
            scrape_timeout: 10s
          scrape_configs:

            - job_name: 'otel-collector'

              static_configs:
                - targets: ['0.0.0.0:8888']

            - job_name: apps
              kubernetes_sd_configs:
              - role: pod
                selectors:
                - role: pod
                  # only scrape data from pods running on the same node as collector
                  field: "spec.nodeName=$NODE_NAME"
              relabel_configs:
              # scrape pods annotated with "prometheus.io/scrape: true"
              - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
                regex: "true"
                action: keep
              # read the port from "prometheus.io/port: <port>" annotation and update scraping address accordingly
              - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
                action: replace
                target_label: __address__
                regex: ([^:]+)(?::\d+)?;(\d+)
                # escaped $1:$2
                replacement: $$1:$$2
              - source_labels: [__meta_kubernetes_namespace]
                action: replace
                target_label: kubernetes_namespace
              - source_labels: [__meta_kubernetes_pod_name]
                action: replace
                target_label: kubernetes_pod_name

            - job_name: 'istiod'
              kubernetes_sd_configs:
              - role: endpoints
                namespaces:
                  names:
                  - istio-system
              relabel_configs:
              - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
                action: keep
                regex: istiod;http-monitoring

            - job_name: 'envoy-stats'
              metrics_path: /stats/prometheus
              kubernetes_sd_configs:
              - role: pod
              relabel_configs:
              - source_labels: [__meta_kubernetes_pod_container_port_name]
                action: keep
                regex: '.*-envoy-prom'

            - job_name: "kubernetes-apiservers"

              kubernetes_sd_configs:
                - role: endpoints

              # Default to scraping over https. If required, just disable this or change to
              # `http`.
              scheme: https

              # This TLS & authorization config is used to connect to the actual scrape
              # endpoints for cluster components. This is separate to discovery auth
              # configuration because discovery & scraping are two separate concerns in
              # Prometheus. The discovery auth config is automatic if Prometheus runs inside
              # the cluster. Otherwise, more config options have to be provided within the
              # <kubernetes_sd_config>.
              tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                # If your node certificates are self-signed or use a different CA to the
                # master CA, then disable certificate verification below. Note that
                # certificate verification is an integral part of a secure infrastructure
                # so this should only be disabled in a controlled environment. You can
                # disable certificate verification by uncommenting the line below.
                #
                # insecure_skip_verify: true
              authorization:
                credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token

              # Keep only the default/kubernetes service endpoints for the https port. This
              # will add targets for each API server which Kubernetes adds an endpoint to
              # the default/kubernetes service.
              relabel_configs:
                - source_labels:
                    [
                      __meta_kubernetes_namespace,
                      __meta_kubernetes_service_name,
                      __meta_kubernetes_endpoint_port_name,
                    ]
                  action: keep
                  regex: default;kubernetes;https

            # Scrape config for nodes (kubelet).
            #
            # Rather than connecting directly to the node, the scrape is proxied though the
            # Kubernetes apiserver.  This means it will work if Prometheus is running out of
            # cluster, or can't connect to nodes for some other reason (e.g. because of
            # firewalling).
            - job_name: "kubernetes-nodes"

              # Default to scraping over https. If required, just disable this or change to
              # `http`.
              scheme: https

              # This TLS & authorization config is used to connect to the actual scrape
              # endpoints for cluster components. This is separate to discovery auth
              # configuration because discovery & scraping are two separate concerns in
              # Prometheus. The discovery auth config is automatic if Prometheus runs inside
              # the cluster. Otherwise, more config options have to be provided within the
              # <kubernetes_sd_config>.
              tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                # If your node certificates are self-signed or use a different CA to the
                # master CA, then disable certificate verification below. Note that
                # certificate verification is an integral part of a secure infrastructure
                # so this should only be disabled in a controlled environment. You can
                # disable certificate verification by uncommenting the line below.
                #
                # insecure_skip_verify: true
              authorization:
                credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token

              kubernetes_sd_configs:
                - role: node

              relabel_configs:
                - action: labelmap
                  regex: __meta_kubernetes_node_label_(.+)

            # Scrape config for Kubelet cAdvisor.
            #
            # This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics
            # (those whose names begin with 'container_') have been removed from the
            # Kubelet metrics endpoint.  This job scrapes the cAdvisor endpoint to
            # retrieve those metrics.
            #
            # In Kubernetes 1.7.0-1.7.2, these metrics are only exposed on the cAdvisor
            # HTTP endpoint; use the "/metrics" endpoint on the 4194 port of nodes. In
            # that case (and ensure cAdvisor's HTTP server hasn't been disabled with the
            # --cadvisor-port=0 Kubelet flag).
            #
            # This job is not necessary and should be removed in Kubernetes 1.6 and
            # earlier versions, or it will cause the metrics to be scraped twice.
            - job_name: "kubernetes-cadvisor"

              # Default to scraping over https. If required, just disable this or change to
              # `http`.
              scheme: https

              # Starting Kubernetes 1.7.3 the cAdvisor metrics are under /metrics/cadvisor.
              # Kubernetes CIS Benchmark recommends against enabling the insecure HTTP
              # servers of Kubernetes, therefore the cAdvisor metrics on the secure handler
              # are used.
              metrics_path: /metrics/cadvisor

              # This TLS & authorization config is used to connect to the actual scrape
              # endpoints for cluster components. This is separate to discovery auth
              # configuration because discovery & scraping are two separate concerns in
              # Prometheus. The discovery auth config is automatic if Prometheus runs inside
              # the cluster. Otherwise, more config options have to be provided within the
              # <kubernetes_sd_config>.
              tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                # If your node certificates are self-signed or use a different CA to the
                # master CA, then disable certificate verification below. Note that
                # certificate verification is an integral part of a secure infrastructure
                # so this should only be disabled in a controlled environment. You can
                # disable certificate verification by uncommenting the line below.
                #
                # insecure_skip_verify: true
              authorization:
                credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token

              kubernetes_sd_configs:
                - role: node

              relabel_configs:
                - action: labelmap
                  regex: __meta_kubernetes_node_label_(.+)

            # Example scrape config for service endpoints.
            #
            # The relabeling allows the actual service scrape endpoint to be configured
            # for all or only some endpoints.
            - job_name: "kubernetes-service-endpoints"

              kubernetes_sd_configs:
                - role: endpoints

              relabel_configs:
                # Example relabel to scrape only endpoints that have
                # "example.io/should_be_scraped = true" annotation.
                #  - source_labels: [__meta_kubernetes_service_annotation_example_io_should_be_scraped]
                #    action: keep
                #    regex: true
                #
                # Example relabel to customize metric path based on endpoints
                # "example.io/metric_path = <metric path>" annotation.
                #  - source_labels: [__meta_kubernetes_service_annotation_example_io_metric_path]
                #    action: replace
                #    target_label: __metrics_path__
                #    regex: (.+)
                #
                # Example relabel to scrape only single, desired port for the service based
                # on endpoints "example.io/scrape_port = <port>" annotation.
                #  - source_labels: [__address__, __meta_kubernetes_service_annotation_example_io_scrape_port]
                #    action: replace
                #    regex: ([^:]+)(?::\d+)?;(\d+)
                #    replacement: $1:$2
                #    target_label: __address__
                #upgradeStrategyample.io/scrape_scheme = <scheme>" annotation.
                #  - source_labels: [__meta_kubernetes_service_annotation_example_io_scrape_scheme]
                #    action: replace
                #    target_label: __scheme__
                #    regex: (https?)
                - action: labelmap
                  regex: __meta_kubernetes_service_label_(.+)
                - source_labels: [__meta_kubernetes_namespace]
                  action: replace
                  target_label: namespace
                - source_labels: [__meta_kubernetes_service_name]
                  action: replace
                  target_label: service

            # Example scrape config for probing services via the Blackbox Exporter.
            #
            # The relabeling allows the actual service scrape endpoint to be configured
            # for all or only some services.
            - job_name: "kubernetes-services"

              metrics_path: /probe
              params:
                module: [http_2xx]

              kubernetes_sd_configs:
                - role: service

              relabel_configs:
                # Example relabel to probe only some services that have "example.io/should_be_probed = true" annotation
                #  - source_labels: [__meta_kubernetes_service_annotation_example_io_should_be_probed]
                #    action: keep
                #    regex: true
                - source_labels: [__address__]
                  target_label: __param_target
                - target_label: __address__
                  replacement: blackbox-exporter.example.com:9115
                - source_labels: [__param_target]
                  target_label: instance
                - action: labelmap
                  regex: __meta_kubernetes_service_label_(.+)
                - source_labels: [__meta_kubernetes_namespace]
                  target_label: namespace
                - source_labels: [__meta_kubernetes_service_name]
                  target_label: service

            # Example scrape config for probing ingresses via the Blackbox Exporter.
            #
            # The relabeling allows the actual ingress scrape endpoint to be configured
            # for all or only some services.
            - job_name: "kubernetes-ingresses"

              metrics_path: /probe
              params:
                module: [http_2xx]

              kubernetes_sd_configs:
                - role: ingress

              relabel_configs:
                # Example relabel to probe only some ingresses that have "example.io/should_be_probed = true" annotation
                #  - source_labels: [__meta_kubernetes_ingress_annotation_example_io_should_be_probed]
                #    action: keep
                #    regex: true
                - source_labels:
                    [
                      __meta_kubernetes_ingress_scheme,
                      __address__,
                      __meta_kubernetes_ingress_path,
                    ]
                  regex: (.+);(.+);(.+)
                  replacement: ${1}://${2}${3}
                  target_label: __param_target
                - target_label: __address__
                  replacement: blackbox-exporter.example.com:9115
                - source_labels: [__param_target]
                  target_label: instance
                - action: labelmap
                  regex: __meta_kubernetes_ingress_label_(.+)
                - source_labels: [__meta_kubernetes_namespace]
                  target_label: namespace
                - source_labels: [__meta_kubernetes_ingress_name]
                  target_label: ingress

            # Example scrape config for pods
            #
            # The relabeling allows the actual pod scrape to be configured
            # for all the declared ports (or port-free target if none is declared)
            # or only some ports.
            - job_name: "kubernetes-pods"

              kubernetes_sd_configs:
                - role: pod

              relabel_configs:
                # Example relabel to scrape only pods that have
                # "example.io/should_be_scraped = true" annotation.
                #  - source_labels: [__meta_kubernetes_pod_annotation_example_io_should_be_scraped]
                #    action: keep
                #    regex: true
                #
                # Example relabel to customize metric path based on pod
                # "example.io/metric_path = <metric path>" annotation.
                #  - source_labels: [__meta_kubernetes_pod_annotation_example_io_metric_path]
                #    action: replace
                #    target_label: __metrics_path__
                #    regex: (.+)
                #
                # Example relabel to scrape only single, desired port for the pod
                # based on pod "example.io/scrape_port = <port>" annotation.
                #  - source_labels: [__address__, __meta_kubernetes_pod_annotation_example_io_scrape_port]
                #    action: replace
                #    regex: ([^:]+)(?::\d+)?;(\d+)
                #    replacement: $1:$2
                #    target_label: __address__
                - action: labelmap
                  regex: __meta_kubernetes_pod_label_(.+)
                - source_labels: [__meta_kubernetes_namespace]
                  action: replace
                  target_label: namespace
                - source_labels: [__meta_kubernetes_pod_name]
                  action: replace
                  target_label: pod

    processors:
      resource:
        attributes:
          - key: env
            value: "$ENVIRONMENT"
            action: upsert
      resourcedetection:
        detectors: 
          - env 
          - eks
          #- ec2
          #- ecs
          #- gce
          #- gke
        timeout: 5s
        override: false
      k8sattributes:
        passthrough: true
      batch:
        send_batch_size: 1024
        timeout: 10s
      memory_limiter:
        check_interval: 1s
        limit_percentage: 50
        spike_limit_percentage: 30
      filter/metrics:
        metrics:
          exclude:
            match_type: strict
            metric_names: 
              - istio_request_duration_milliseconds_bucket
              - istio_response_bytes_bucket
              - istio_request_bytes_bucket
              - rest_client_request_duration_seconds_bucket
              - storage_operation_duration_seconds_bucket
    extensions:
      health_check:
      pprof:
      zpages:
      memory_ballast:
        size_in_percentage: 20
    exporters:
      logging:
        loglevel: info
      otlp/grafana:
        endpoint: tempo-us-central1.grafana.net:443
        headers:
          authorization: "$exporter_otlp_grafana_headers_authorization"
      prometheusremotewrite/grafana:
        endpoint: "https://prometheus-prod-10-prod-us-central-0.grafana.net/api/prom/push"
        resource_to_telemetry_conversion:
          enabled: true
        headers:
          authorization: "$exporter_prometheusremotewrite_grafana_headers_authorization"
        retry_on_failure:
            enabled: true
            initial_interval: 10s
            max_interval: 60s
            max_elapsed_time: 10m
        write_buffer_size: 524288
        remote_write_queue:
            queue_size: 2000
            num_consumers: 10
      loki/grafana:
        endpoint: https://logs-prod-us-central1.grafana.net/api/prom/push
        headers:
          authorization: "$exporter_loki_grafana_headers_authorization"
        tenant_id: "xxx"
        timeout: 5s
        tls:
          insecure: true
        read_buffer_size: 1024
        write_buffer_size: 2048
        sending_queue:
          enabled: true
          num_consumers: 2
          queue_size: 500
        retry_on_failure:
          enabled: true
          initial_interval: 10s
          max_interval: 60s
          max_elapsed_time: 10m
        labels:
          resource:
            stream: "stream"
            container_name: "container_name"
            namespace: "namespace"
            pod_name: "pod_name"
            uid: "uid"
            run_id: "run_id"
            env: "env"
            cloud.region": "cloud_region"
            cloud.platform: "cloud_platform"
            k8s.cluster.name: "k8s_cluster_name"
            cloud.provider: "cloud_provider"
          attributes:
            env: "$ENVIRONMENT"
      otlp/honeycomb:
        endpoint: "api.honeycomb.io:443"
        headers:
          "x-honeycomb-team": "$exporter_otlp_honeycomb_headers_x_honeycomb_team"
          "x-honeycomb-dataset": "$ENVIRONMENT"
    service:
      telemetry:
        logs:
          level: "error"
      extensions: 
        - pprof 
        - zpages 
        - health_check
        - memory_ballast
      pipelines:
        logs:
          receivers:
            #- filelog
            - otlp
          processors:
            - memory_limiter
            - resourcedetection 
            - resource
            - batch 
          exporters:
            #- logging
            - loki/grafana
        traces:
          receivers:
            - zipkin
            - opencensus
            - otlp
          processors:
            - memory_limiter
            - resourcedetection
            - k8sattributes
            - resource
            - batch
          exporters:
            #- logging
            - otlp/honeycomb
        metrics:
          receivers:
            - prometheus
            - otlp
          processors:
            - filter/metrics
            - memory_limiter
            - resourcedetection
            - k8sattributes
            - resource
            - batch
          exporters:
            #- logging
            - prometheusremotewrite/grafana

Environment OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")

Additional context

panic: runtime error: invalid memory address or nil pointer dereference                                                                                                                                                                                                                           │
│ [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x336eb9e]                                                                                                                                                                                                                           │
│                                                                                                                                                                                                                                                                                                   │
│ goroutine 456 [running]:                                                                                                                                                                                                                                                                          │
│ github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver/internal.(*MetricsAdjusterPdata).adjustMetricHistogram(0xc0ca7c1a80, 0xc2b447d7e8)                                                                                                                          │
│     github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.42.0/internal/otlp_metrics_adjuster.go:365 +0x15e                                                                                                                                                    │
│ github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver/internal.(*MetricsAdjusterPdata).adjustMetricPoints(0x420a420, 0x1)                                                                                                                                         │
│     github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.42.0/internal/otlp_metrics_adjuster.go:280 +0x286                                                                                                                                                    │
│ github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver/internal.(*MetricsAdjusterPdata).adjustMetric(0x4032000000000000, 0x1f)                                                                                                                                     │
│     github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.42.0/internal/otlp_metrics_adjuster.go:269 +0xe5                                                                                                                                                     │
│ github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver/internal.(*MetricsAdjusterPdata).AdjustMetrics(0xc0ca7c1a80, 0xc0ca7c1a68)                                                                                                                                  │
│     github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.42.0/internal/otlp_metrics_adjuster.go:255 +0xff                                                                                                                                                     │
│ github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver/internal.(*transaction).Commit(0xc247f052c0)                                                                                                                                                                │
│     github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.42.0/internal/transaction.go:209 +0x24e                                                                                                                                                              │
│ github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1()                                                                                                                                                                                                                     │
│     github.com/prometheus/prometheus@v1.8.2-0.20210621150501-ff58416a0b02/scrape/scrape.go:1195 +0x45                                                                                                                                                                                             │
│ github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport(0xc0d27ba240, 0x73334e0, 0x3851f109, {0xed9824d4b, 0x73334e0, 0x73334e0}, {0x40d1f4, 0x3be6f40, 0x73334e0}, 0x0)                                                                                                            │
│     github.com/prometheus/prometheus@v1.8.2-0.20210621150501-ff58416a0b02/scrape/scrape.go:1262 +0xf18                                                                                                                                                                                            │
│ github.com/prometheus/prometheus/scrape.(*scrapeLoop).run(0xc0d27ba240, 0xdf8475800, 0x843726, 0xc0d4dea630)                                                                                                                                                                                      │
│     github.com/prometheus/prometheus@v1.8.2-0.20210621150501-ff58416a0b02/scrape/scrape.go:1148 +0x370                                                                                                                                                                                            │
│ created by github.com/prometheus/prometheus/scrape.(*scrapePool).sync                                                                                                                                                                                                                             │
│     github.com/prometheus/prometheus@v1.8.2-0.20210621150501-ff58416a0b02/scrape/scrape.go:564 +0x9af   
Aneurysm9 commented 2 years ago

Thanks for the report. Are you able to reproduce this consistently? If so, can you see if you are able to reproduce after adding --feature-gates=+receiver.prometheus.OTLPDirect to the collector's CLI flags? That will enable a different metrics appender that is becoming the default in the release that is about to happen. I'm interested whether this also happens with that pipeline.

It looks like what is happening is that the first scrape for this metric had a stale value set which put the start time adjustment logic in a bad state. I think we can add some checks to compensate for this, but I'd like to know better how this state arose and whether it is unique to the outgoing pipeline or also an issue with the new implementation.

templarfelix commented 2 years ago

@Aneurysm9 i try and receive same error:

   otc-container:                                                                                                                                                                                                                                                                                  │
│     Container ID:  docker://11f907302e89f04b0e33731d94bec87fc2d3369d8179d4faf0c9b6ded7575629                                                                                                                                                                                                      │
│     Image:         otel/opentelemetry-collector-contrib:0.42.0                                                                                                                                                                                                                                    │
│     Image ID:      docker-pullable://otel/opentelemetry-collector-contrib@sha256:d52ea80e39430e778705a3a7a2b115c4ef812073e704422fe974bcc2f9d2c60a                                                                                                                                                 │
│     Port:          <none>                                                                                                                                                                                                                                                                         │
│     Host Port:     <none>                                                                                                                                                                                                                                                                         │
│     Args:                                                                                                                                                                                                                                                                                         │
│       --feature-gates=-receiver.prometheus.OTLPDirect                                                                                                                                                                                                                                             │
│       --metrics-level=detailed                                                                                                                                                                                                                                                                    │
│       --config=/conf/collector.yaml    
Aneurysm9 commented 2 years ago

The argument is backward. It must be --feature-gates=+receiver.prometheus.OTLPDirect. Using --feature-gates=-receiver.prometheus.OTLPDirect as you have there is the default state in v0.42.0 and changed nothing.

templarfelix commented 2 years ago

same problem

│ panic: runtime error: index out of range [-1]                                                                                                                                                                                                                                                     │
│                                                                                                                                                                                                                                                                                                   │
│ goroutine 121 [running]:                                                                                                                                                                                                                                                                          │
│ github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter.addSingleHistogramDataPoint({0xc000c72360}, {0x1bf}, {0x0}, {0x0, 0x0}, 0xc2254139a0, 0x8)                                                                                                       │
│     github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter@v0.42.0/helper.go:397 +0x8bd 
Containers:                                                                                                                                                                                                                                                                                       │
│   otc-container:                                                                                                                                                                                                                                                                                  │
│     Container ID:  docker://21b690856d51bf8f15a276c90cb8222f149e2b4b3dc771dedf40ae4746c02d26                                                                                                                                                                                                      │
│     Image:         otel/opentelemetry-collector-contrib:0.42.0                                                                                                                                                                                                                                    │
│     Image ID:      docker-pullable://otel/opentelemetry-collector-contrib@sha256:d52ea80e39430e778705a3a7a2b115c4ef812073e704422fe974bcc2f9d2c60a                                                                                                                                                 │
│     Port:          <none>                                                                                                                                                                                                                                                                         │
│     Host Port:     <none>                                                                                                                                                                                                                                                                         │
│     Args:                                                                                                                                                                                                                                                                                         │
│       --feature-gates=+receiver.prometheus.OTLPDirect                                                                                                                                                                                                                                             │
│       --metrics-level=detailed                                                                                                                                                                                                                                                                    │
│       --config=/conf/collector.yaml 
Aneurysm9 commented 2 years ago

That is a different problem. This time it is in the PRW exporter, not the receiver. Here it seems that the stale marker has been correctly handled by the receiver but the exporter is having an issue reconstructing the le=+Inf bucket. I should be able to get a fix for that prepared fairly quickly.

Aneurysm9 commented 2 years ago

Looks like there's already a PR looking to fix the PRW exporter issue.

templarfelix commented 2 years ago

thanks for help @Aneurysm9

brianpham commented 2 years ago

Do you know how long before this is in the next release? I am encountering this on one of our deployments.

Aneurysm9 commented 2 years ago

Do you know how long before this is in the next release? I am encountering this on one of our deployments.

We attempt to ship a release every two weeks. I would expect that this will be included in the v0.44.0 release that should happen next week.

brianpham commented 2 years ago

Thanks @Aneurysm9 😄