open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.91k stars 2.27k forks source link

Failed to scrape Prometheus endpoint {"kind": "receiver", "name": "prometheus", "data_type": "metrics" #32976

Closed ashishstationcasinos closed 4 months ago

ashishstationcasinos commented 4 months ago

Component(s)

receiver/prometheus

What happened?

Description

Installed OpenTelemetry Collector on AKS Kubernetes Cluster using Helm Chart . And Provided the Prometheus Config for MongoDB Atlas to pull Metrics by OpenTelemetry collector. Helm Chart Values.yaml

# Default values for opentelemetry-collector.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

nameOverride: ""
fullnameOverride: ""

# Valid values are "daemonset", "deployment", and "statefulset".
mode: "deployment"

# Specify which namespace should be used to deploy the resources into
namespaceOverride: ""

# Handles basic configuration of components that
# also require k8s modifications to work correctly.
# .Values.config can be used to modify/add to a preset
# component configuration, but CANNOT be used to remove
# preset configuration. If you require removal of any
# sections of a preset configuration, you cannot use
# the preset. Instead, configure the component manually in
# .Values.config and use the other fields supplied in the
# values.yaml to configure k8s as necessary.
presets:
  # Configures the collector to collect logs.
  # Adds the filelog receiver to the logs pipeline
  # and adds the necessary volumes and volume mounts.
  # Best used with mode = daemonset.
  # See https://opentelemetry.io/docs/kubernetes/collector/components/#filelog-receiver for details on the receiver.
  logsCollection:
    enabled: false
    includeCollectorLogs: false
    # Enabling this writes checkpoints in /var/lib/otelcol/ host directory.
    # Note this changes collector's user to root, so that it can write to host directory.
    storeCheckpoints: false
    # The maximum bytes size of the recombined field.
    # Once the size exceeds the limit, all received entries of the source will be combined and flushed.
    maxRecombineLogSize: 102400
  # Configures the collector to collect host metrics.
  # Adds the hostmetrics receiver to the metrics pipeline
  # and adds the necessary volumes and volume mounts.
  # Best used with mode = daemonset.
  # See https://opentelemetry.io/docs/kubernetes/collector/components/#host-metrics-receiver for details on the receiver.
  hostMetrics:
    enabled: false
  # Configures the Kubernetes Processor to add Kubernetes metadata.
  # Adds the k8sattributes processor to all the pipelines
  # and adds the necessary rules to ClusteRole.
  # Best used with mode = daemonset.
  # See https://opentelemetry.io/docs/kubernetes/collector/components/#kubernetes-attributes-processor for details on the receiver.
  kubernetesAttributes:
    enabled: false
    # When enabled the processor will extra all labels for an associated pod and add them as resource attributes.
    # The label's exact name will be the key.
    extractAllPodLabels: false
    # When enabled the processor will extra all annotations for an associated pod and add them as resource attributes.
    # The annotation's exact name will be the key.
    extractAllPodAnnotations: false
  # Configures the collector to collect node, pod, and container metrics from the API server on a kubelet..
  # Adds the kubeletstats receiver to the metrics pipeline
  # and adds the necessary rules to ClusteRole.
  # Best used with mode = daemonset.
  # See https://opentelemetry.io/docs/kubernetes/collector/components/#kubeletstats-receiver for details on the receiver.
  kubeletMetrics:
    enabled: false
  # Configures the collector to collect kubernetes events.
  # Adds the k8sobject receiver to the logs pipeline
  # and collects kubernetes events by default.
  # Best used with mode = deployment or statefulset.
  # See https://opentelemetry.io/docs/kubernetes/collector/components/#kubernetes-objects-receiver for details on the receiver.
  kubernetesEvents:
    enabled: false
  # Configures the Kubernetes Cluster Receiver to collect cluster-level metrics.
  # Adds the k8s_cluster receiver to the metrics pipeline
  # and adds the necessary rules to ClusteRole.
  # Best used with mode = deployment or statefulset.
  # See https://opentelemetry.io/docs/kubernetes/collector/components/#kubernetes-cluster-receiver for details on the receiver.
  clusterMetrics:
    enabled: false

configMap:
  # Specifies whether a configMap should be created (true by default)
  create: true
  # Specifies an existing ConfigMap to be mounted to the pod
  # The ConfigMap MUST include the collector configuration via a key named 'relay' or the collector will not start.
  existingName: ""

# Base collector configuration.
# Supports templating. To escape existing instances of {{ }}, use {{` <original content> `}}.
# For example, {{ REDACTED_EMAIL }} becomes {{` {{ REDACTED_EMAIL }} `}}.
config:
  exporters:
    debug: {}
  receivers:
    prometheus:
      config:
       scrape_configs:
         - job_name: "Dev-mongo-metrics"
           scrape_interval: 10s
           metrics_path: /metrics
           scheme : https
           basic_auth:
             username: prometheus
             password: xxxxxxxxx
           http_sd_configs:
           - url: https://cloud.mongodb.com/prometheus/v1.0/groups/xxxxxxxxxxxxx/discovery
             refresh_interval: 60s
             basic_auth:
               username: prometheus
               password: xxxxxxxx

  service:
    pipelines:
      metrics:
        exporters:
          - debug
        receivers:
          - prometheus
image:
  # If you want to use the core image `otel/opentelemetry-collector`, you also need to change `command.name` value to `otelcol`.
  repository: "otel/opentelemetry-collector-contrib"
  pullPolicy: IfNotPresent
  # Overrides the image tag whose default is the chart appVersion.
  tag: "0.99.0"
  # When digest is set to a non-empty value, images will be pulled by digest (regardless of tag value).
  digest: ""
imagePullSecrets: []

# OpenTelemetry Collector executable
command:
  name: "otelcol-contrib"
  extraArgs: []

serviceAccount:
  # Specifies whether a service account should be created
  create: true
  # Annotations to add to the service account
  annotations: {}
  # The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  name: ""

clusterRole:
  # Specifies whether a clusterRole should be created
  # Some presets also trigger the creation of a cluster role and cluster role binding.
  # If using one of those presets, this field is no-op.
  create: false
  # Annotations to add to the clusterRole
  # Can be used in combination with presets that create a cluster role.
  annotations: {}
  # The name of the clusterRole to use.
  # If not set a name is generated using the fullname template
  # Can be used in combination with presets that create a cluster role.
  name: ""
  # A set of rules as documented here : https://kubernetes.io/docs/reference/access-authn-authz/rbac/
  # Can be used in combination with presets that create a cluster role to add additional rules.
  rules: []
  # - apiGroups:
  #   - ''
  #   resources:
  #   - 'pods'
  #   - 'nodes'
  #   verbs:
  #   - 'get'
  #   - 'list'
  #   - 'watch'

  clusterRoleBinding:
    # Annotations to add to the clusterRoleBinding
    # Can be used in combination with presets that create a cluster role binding.
    annotations: {}
    # The name of the clusterRoleBinding to use.
    # If not set a name is generated using the fullname template
    # Can be used in combination with presets that create a cluster role binding.
    name: ""

podSecurityContext: {}
securityContext: {}

nodeSelector: {}
tolerations: []
affinity: {}
topologySpreadConstraints: []

# Allows for pod scheduler prioritisation
priorityClassName: ""

extraEnvs: []
extraEnvsFrom: []
extraVolumes: []
extraVolumeMounts: []

# Configuration for ports
# nodePort is also allowed
ports:
  otlp:
    enabled: true
    containerPort: 4317
    servicePort: 4317
    hostPort: 4317
    protocol: TCP
    # nodePort: 30317
    appProtocol: grpc
  otlp-http:
    enabled: true
    containerPort: 4318
    servicePort: 4318
    hostPort: 4318
    protocol: TCP
  jaeger-compact:
    enabled: true
    containerPort: 6831
    servicePort: 6831
    hostPort: 6831
    protocol: UDP
  jaeger-thrift:
    enabled: true
    containerPort: 14268
    servicePort: 14268
    hostPort: 14268
    protocol: TCP
  jaeger-grpc:
    enabled: true
    containerPort: 14250
    servicePort: 14250
    hostPort: 14250
    protocol: TCP
  zipkin:
    enabled: true
    containerPort: 9411
    servicePort: 9411
    hostPort: 9411
    protocol: TCP
  metrics:
    # The metrics port is disabled by default. However you need to enable the port
    # in order to use the ServiceMonitor (serviceMonitor.enabled) or PodMonitor (podMonitor.enabled).
    enabled: false
    containerPort: 8888
    servicePort: 8888
    protocol: TCP

# Resource limits & requests. Update according to your own use case as these values might be too low for a typical deployment.
resources: {}
# resources:
#   limits:
#     cpu: 250m
#     memory: 512Mi

podAnnotations: {}

podLabels: {}

# Common labels to add to all otel-collector resources. Evaluated as a template.
additionalLabels: {}
#  app.kubernetes.io/part-of: my-app

# Host networking requested for this pod. Use the host's network namespace.
hostNetwork: false

# Adding entries to Pod /etc/hosts with HostAliases
# https://kubernetes.io/docs/tasks/network/customize-hosts-file-for-pods/
hostAliases: []
  # - ip: "1.2.3.4"
  #   hostnames:
  #     - "my.host.com"

# Pod DNS policy ClusterFirst, ClusterFirstWithHostNet, None, Default, None
dnsPolicy: ""

# Custom DNS config. Required when DNS policy is None.
dnsConfig: {}

# only used with deployment mode
replicaCount: 1

# only used with deployment mode
revisionHistoryLimit: 10

annotations: {}

# List of extra sidecars to add.
# This also supports template content, which will eventually be converted to yaml.
extraContainers: []
# extraContainers:
#   - name: test
#     command:
#       - cp
#     args:
#       - /bin/sleep
#       - /test/sleep
#     image: busybox:latest
#     volumeMounts:
#       - name: test
#         mountPath: /test

# List of init container specs, e.g. for copying a binary to be executed as a lifecycle hook.
# This also supports template content, which will eventually be converted to yaml.
# Another usage of init containers is e.g. initializing filesystem permissions to the OTLP Collector user `10001` in case you are using persistence and the volume is producing a permission denied error for the OTLP Collector container.
initContainers: []
# initContainers:
#   - name: test
#     image: busybox:latest
#     command:
#       - cp
#     args:
#       - /bin/sleep
#       - /test/sleep
#     volumeMounts:
#       - name: test
#         mountPath: /test
#  - name: init-fs
#    image: busybox:latest
#    command:
#      - sh
#      - '-c'
#      - 'chown -R 10001: /var/lib/storage/otc' # use the path given as per `extensions.file_storage.directory` & `extraVolumeMounts[x].mountPath`
#    volumeMounts:
#      - name: opentelemetry-collector-data # use the name of the volume used for persistence
#        mountPath: /var/lib/storage/otc # use the path given as per `extensions.file_storage.directory` & `extraVolumeMounts[x].mountPath`

# Pod lifecycle policies.
lifecycleHooks: {}
# lifecycleHooks:
#   preStop:
#     exec:
#       command:
#       - /test/sleep
#       - "5"

# liveness probe configuration
# Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
##
livenessProbe:
  # Number of seconds after the container has started before startup, liveness or readiness probes are initiated.
  # initialDelaySeconds: 1
  # How often in seconds to perform the probe.
  # periodSeconds: 10
  # Number of seconds after which the probe times out.
  # timeoutSeconds: 1
  # Minimum consecutive failures for the probe to be considered failed after having succeeded.
  # failureThreshold: 1
  # Duration in seconds the pod needs to terminate gracefully upon probe failure.
  # terminationGracePeriodSeconds: 10
  httpGet:
    port: 13133
    path: /

# readiness probe configuration
# Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
##
readinessProbe:
  # Number of seconds after the container has started before startup, liveness or readiness probes are initiated.
  # initialDelaySeconds: 1
  # How often (in seconds) to perform the probe.
  # periodSeconds: 10
  # Number of seconds after which the probe times out.
  # timeoutSeconds: 1
  # Minimum consecutive successes for the probe to be considered successful after having failed.
  # successThreshold: 1
  # Minimum consecutive failures for the probe to be considered failed after having succeeded.
  # failureThreshold: 1
  httpGet:
    port: 13133
    path: /

service:
  # Enable the creation of a Service.
  # By default, it's enabled on mode != daemonset.
  # However, to enable it on mode = daemonset, its creation must be explicitly enabled
  # enabled: true

  type: ClusterIP
  # type: LoadBalancer
  # loadBalancerIP: 1.2.3.4
  # loadBalancerSourceRanges: []

  # By default, Service of type 'LoadBalancer' will be created setting 'externalTrafficPolicy: Cluster'
  # unless other value is explicitly set.
  # Possible values are Cluster or Local (https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip)
  # externalTrafficPolicy: Cluster

  annotations: {}

  # By default, Service will be created setting 'internalTrafficPolicy: Local' on mode = daemonset
  # unless other value is explicitly set.
  # Setting 'internalTrafficPolicy: Cluster' on a daemonset is not recommended
  # internalTrafficPolicy: Cluster

ingress:
  enabled: false
  annotations:
    kubernetes.io/ingress.class: nginx
  hosts:
     - host: xxxxxxxxxxxxxxxxxxxxxx
       paths:
         - path: /
           pathType: Prefix
           port: 4318
  tls:
     - secretName: xxxxxxxxx
       hosts:
         - xxxxxxxxxxxxxxxxxx

  # Additional ingresses - only created if ingress.enabled is true
  # Useful for when differently annotated ingress services are required
  # Each additional ingress needs key "name" set to something unique
  additionalIngresses: []
  # - name: cloudwatch
  #   ingressClassName: nginx
  #   annotations: {}
  #   hosts:
  #     - host: collector.example.com
  #       paths:
  #         - path: /
  #           pathType: Prefix
  #           port: 4318
  #   tls:
  #     - secretName: collector-tls
  #       hosts:
  #         - collector.example.com

podMonitor:
  # The pod monitor by default scrapes the metrics port.
  # The metrics port needs to be enabled as well.
  enabled: false
  metricsEndpoints:
    - port: metrics
      # interval: 15s

  # additional labels for the PodMonitor
  extraLabels: {}
  #   release: kube-prometheus-stack

serviceMonitor:
  # The service monitor by default scrapes the metrics port.
  # The metrics port needs to be enabled as well.
  enabled: false
  metricsEndpoints:
    - port: metrics
      # interval: 15s

  # additional labels for the ServiceMonitor
  extraLabels: {}
  #  release: kube-prometheus-stack

# PodDisruptionBudget is used only if deployment enabled
podDisruptionBudget:
  enabled: false
#   minAvailable: 2
#   maxUnavailable: 1

# autoscaling is used only if mode is "deployment" or "statefulset"
autoscaling:
  enabled: false
  minReplicas: 1
  maxReplicas: 10
  behavior: {}
  targetCPUUtilizationPercentage: 80
  # targetMemoryUtilizationPercentage: 80

rollout:
  rollingUpdate: {}
  # When 'mode: daemonset', maxSurge cannot be used when hostPort is set for any of the ports
  # maxSurge: 25%
  # maxUnavailable: 0
  strategy: RollingUpdate

prometheusRule:
  enabled: false
  groups: []
  # Create default rules for monitoring the collector
  defaultRules:
    enabled: false

  # additional labels for the PrometheusRule
  extraLabels: {}

statefulset:
  # volumeClaimTemplates for a statefulset
  volumeClaimTemplates: []
  podManagementPolicy: "Parallel"
  # Controls if and how PVCs created by the StatefulSet are deleted. Available in Kubernetes 1.23+.
  persistentVolumeClaimRetentionPolicy:
    enabled: false
    whenDeleted: Retain
    whenScaled: Retain

networkPolicy:
  enabled: false

  # Annotations to add to the NetworkPolicy
  annotations: {}

  # Configure the 'from' clause of the NetworkPolicy.
  # By default this will restrict traffic to ports enabled for the Collector. If
  # you wish to further restrict traffic to other hosts or specific namespaces,
  # see the standard NetworkPolicy 'spec.ingress.from' definition for more info:
  # https://kubernetes.io/docs/reference/kubernetes-api/policy-resources/network-policy-v1/
  allowIngressFrom: []
  # # Allow traffic from any pod in any namespace, but not external hosts
  # - namespaceSelector: {}
  # # Allow external access from a specific cidr block
  # - ipBlock:
  #     cidr: 192.168.1.64/32
  # # Allow access from pods in specific namespaces
  # - namespaceSelector:
  #     matchExpressions:
  #       - key: kubernetes.io/metadata.name
  #         operator: In
  #         values:
  #           - "cats"
  #           - "dogs"

  # Add additional ingress rules to specific ports
  # Useful to allow external hosts/services to access specific ports
  # An example is allowing an external prometheus server to scrape metrics
  #
  # See the standard NetworkPolicy 'spec.ingress' definition for more info:
  # https://kubernetes.io/docs/reference/kubernetes-api/policy-resources/network-policy-v1/
  extraIngressRules: []
  # - ports:
  #   - port: metrics
  #     protocol: TCP
  #   from:
  #     - ipBlock:
  #         cidr: 192.168.1.64/32

  # Restrict egress traffic from the OpenTelemetry collector pod
  # See the standard NetworkPolicy 'spec.egress' definition for more info:
  # https://kubernetes.io/docs/reference/kubernetes-api/policy-resources/network-policy-v1/
  egressRules: []
  #  - to:
  #      - namespaceSelector: {}
  #      - ipBlock:
  #          cidr: 192.168.10.10/24
  #    ports:
  #      - port: 1234
  #        protocol: TCP

# When enabled, the chart will set the GOMEMLIMIT env var to 80% of the configured
# resources.limits.memory and remove the memory ballast extension.
# If no resources.limits.memory are defined enabling does nothing.
# In a future release this setting will be enabled by default.
# See https://github.com/open-telemetry/opentelemetry-helm-charts/issues/891
# for more details.
useGOMEMLIMIT: true

Steps to Reproduce

Deploy OpenTelemetry collector with Above Values.yaml using helm chart image: otel/opentelemetry-collector-contrib:0.99.0

Expected Result

 It should able to scarpe the Metrics of Prometheus

Actual Result

2024-05-10T02:41:59.733Z        warn    internal/transaction.go:129     Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1715308909732, "target_labels": "{__name__=\"up\", cl_name=\"xxxxxxxx\", group_id=\"xxxxxxxx\", instance=\"xxxxxxxxxxxx:27018\", job=\"Dev-mongo-metrics\", org_id=\"xxxxxxxxxx\"}"}

Collector version

0.99.0

Environment information

Environment

AKS cluster with Kubernetes version 1.26.10 OpenTelemetry collector 0.99.0

OpenTelemetry Collector configuration

config:
  exporters:
    debug: {}
  receivers:
    prometheus:
      config:
       scrape_configs:
         - job_name: "Dev-mongo-metrics"
           scrape_interval: 10s
           metrics_path: /metrics
           scheme : https
           basic_auth:
             username: prometheus
             password: xxxxxx
           http_sd_configs:
           - url: https://cloud.mongodb.com/prometheus/v1.0/groups/xxxxxxxx/discovery
             refresh_interval: 60s
             basic_auth:
               username: prometheus
               password: xxxxxxxx

  service:
    pipelines:
      metrics:
        exporters:
          - debug
        receivers:
          - prometheus

Log output

2024-05-10T02:41:58.675Z        info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 5, "data points": 5}
2024-05-10T02:41:59.733Z        warn    internal/transaction.go:129     Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1715308909732, "target_labels": "{__name__=\"up\", cl_name=\"xcs\", group_id=\"xxxxxxx\", instance=\"xxxxxxxxxxxxxxxxxmongodb.net:27018\", job=\"Station Casinos Dev-mongo-metrics\", org_id=\"xxxxxxxxxxx64d0b4c\"}"}
2024-05-10T02:41:59.879Z        info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 5, "data points": 5}

Additional context

No response

github-actions[bot] commented 4 months ago

Pinging code owners:

dashpole commented 4 months ago

Can you enable debug logging in the collector? I think adding this should work. It will make the log include the reason why the scrape failed.

service:
    telemetry:
        logs:
            level: DEBUG
ashishstationcasinos commented 4 months ago

Enabled the log leve to Debug.

It displayed the Actual Reason of Firewall issue.