grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
22.8k stars 3.32k forks source link

Promtail 2.7.4 uses more memory than 2.7.3 #8690

Open Joibel opened 1 year ago

Joibel commented 1 year ago

Describe the bug Promtail 2.7.4 uses more memory than 2.7.3

To Reproduce Steps to reproduce the behavior: Install promtail 2.7.3 from the helm chart with memory limits set low but adequate on an a cluster actively producing logs, feeding them to loki. See that it works without OOM behaviour. Upgrade to 2.7.4. Observe pods going OOM.

Expected behavior Promtail to continue to use similar amounts of memory to before.

Environment:

Screenshots, Promtail config, or terminal output Helm pod resource configuration.

   resources:
     limits:                                                              
       cpu: 200m      
       memory: 200Mi                                                      
     requests:                                                            
       cpu: 100m
       memory: 200Mi

I imagine the golang bump from 1.19.5->1.20.1 is to blame.

Joibel commented 1 year ago

I can do a redacted values.yaml if needed but we aren't doing anything spectacular in it.

DylanGuedes commented 1 year ago

could you share your promtail configuration? maybe it is a specific scraper etc

DylanGuedes commented 1 year ago

Btw the number of Promtail changes on v2.7.4 is pretty low and all of them are only fixes, so I'm surprised by this. So yeah, you're probably right on the blame being the Golang bump.

Joibel commented 1 year ago

Here is the full values.yaml - deployed to two clusters, both OOMing. I also have a third cluster which exhibited the same problem with a slightly different config. They have differing workloads running in them. I have obviously reverted to 2.7.3 as they are all live.

promtail:
  # -- Overrides the chart's name
  nameOverride: null

  # -- Overrides the chart's computed fullname
  fullnameOverride: null

  daemonset:
    # -- Deploys Promtail as a DaemonSet
    enabled: true

  deployment:
    # -- Deploys Promtail as a Deployment
    enabled: false
    replicaCount: 1
    autoscaling:
      # -- Creates a HorizontalPodAutoscaler for the deployment
      enabled: false
      minReplicas: 1
      maxReplicas: 10
      targetCPUUtilizationPercentage: 80
      targetMemoryUtilizationPercentage:

  ingress:
    enabled: false
    # For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
    # See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
    # ingressClassName: nginx
    # Values can be templated
    annotations: {}
      # kubernetes.io/ingress.class: nginx
      # kubernetes.io/tls-acme: "true"
    labels: {}
    path: /

    # pathType is only for k8s >= 1.1=
    pathType: Prefix

    hosts:
      - chart-example.local
    ## Extra paths to prepend to every host configuration. This is useful when working with annotation based services.
    extraPaths: []
    # - path: /*
    #   backend:
    #     serviceName: ssl-redirect
    #     servicePort: use-annotation
    ## Or for k8s > 1.19
    # - path: /*
    #   pathType: Prefix
    #   backend:
    #     service:
    #       name: ssl-redirect
    #       port:
    #         name: use-annotation

    tls: []
    #  - secretName: chart-example-tls
    #    hosts:
    #      - chart-example.local
  secret:
    # -- Labels for the Secret
    labels: {}
    # -- Annotations for the Secret
    annotations: {}

  configmap:
    # -- If enabled, promtail config will be created as a ConfigMap instead of a secret
    enabled: false

  initContainer: []
    # # -- Specifies whether the init container for setting inotify max user instances is to be enabled
    # - name: init
    #   # -- Docker registry, image and tag for the init container image
    #   image: docker.io/busybox:1.33
    #   # -- Docker image pull policy for the init container image
    #   imagePullPolicy: IfNotPresent
    #   # -- The inotify max user instances to configure
    #   command:
    #     - sh
    #     - -c
    #     - sysctl -w fs.inotify.max_user_instances=128
    #   securityContext:
    #     privileged: true

  image:
    # -- The Docker registry
    registry: docker.sendilab.net/proxy
    # -- Docker image repository
    repository: grafana/promtail
    # -- Overrides the image tag whose default is the chart's appVersion
    tag: null
    # -- Docker image pull policy
    pullPolicy: Always

  # -- Image pull secrets for Docker images
  imagePullSecrets: []

  # -- Annotations for the DaemonSet
  annotations: {}

  # -- The update strategy for the DaemonSet
  updateStrategy: {}

  # -- Pod labels
  podLabels: {}

  # -- Pod annotations
  podAnnotations: {}
  #  prometheus.io/scrape: "true"
  #  prometheus.io/port: "http-metrics"

  # -- The name of the PriorityClass
  priorityClassName: null

  # -- Liveness probe
  livenessProbe: {}

  # -- Readiness probe
  # @default -- See `values.yaml`
  readinessProbe:
    failureThreshold: 5
    httpGet:
      path: "{{ printf `%s/ready` .Values.httpPathPrefix }}"
      port: http-metrics
    initialDelaySeconds: 10
    periodSeconds: 10
    successThreshold: 1
    timeoutSeconds: 1

  # -- Resource requests and limits
  resources:
    limits:
      cpu: 200m
      memory: 200Mi
    requests:
      cpu: 100m
      memory: 200Mi

  # -- The security context for pods
  podSecurityContext:
    runAsUser: 0
    runAsGroup: 0

  # -- The security context for containers
  containerSecurityContext:
    readOnlyRootFilesystem: true
    capabilities:
      drop:
        - ALL
    allowPrivilegeEscalation: false

  rbac:
    # -- Specifies whether RBAC resources are to be created
    create: true
    # -- Specifies whether a PodSecurityPolicy is to be created
    pspEnabled: false

  # -- The name of the Namespace to deploy
  # If not set, `.Release.Namespace` is used
  namespace: null

  serviceAccount:
    # -- Specifies whether a ServiceAccount should be created
    create: true
    # -- The name of the ServiceAccount to use.
    # If not set and `create` is true, a name is generated using the fullname template
    name: null
    # -- Image pull secrets for the service account
    imagePullSecrets: []
    # -- Annotations for the service account
    annotations: {}

  # -- Node selector for pods
  nodeSelector: {}

  # -- Affinity configuration for pods
  affinity: {}

  # -- Tolerations for pods. By default, pods will be scheduled on master/control-plane nodes.
  tolerations:
    - key: node-role.kubernetes.io/master
      operator: Exists
      effect: NoSchedule
    - key: node-role.kubernetes.io/control-plane
      operator: Exists
      effect: NoSchedule

  # -- Default volumes that are mounted into pods. In most cases, these should not be changed.
  # Use `extraVolumes`/`extraVolumeMounts` for additional custom volumes.
  # @default -- See `values.yaml`
  defaultVolumes:
    - name: run
      hostPath:
        path: /run/promtail
    - name: containers
      hostPath:
        path: /var/lib/docker/containers
    - name: pods
      hostPath:
        path: /var/log/pods

  # -- Default volume mounts. Corresponds to `volumes`.
  # @default -- See `values.yaml`
  defaultVolumeMounts:
    - name: run
      mountPath: /run/promtail
    - name: containers
      mountPath: /var/lib/docker/containers
      readOnly: true
    - name: pods
      mountPath: /var/log/pods
      readOnly: true

  # Extra volumes to be added in addition to those specified under `defaultVolumes`.
  extraVolumes: []

  # Extra volume mounts together. Corresponds to `extraVolumes`.
  extraVolumeMounts: []

  # Extra args for the Promtail container.
  extraArgs: []
  # -- Example:
  # -- extraArgs:
  # --   - -client.external-labels=hostname=$(HOSTNAME)

  # -- Extra environment variables
  extraEnv: []

  # -- Extra environment variables from secrets or configmaps
  extraEnvFrom: []

  # -- Configure enableServiceLinks in pod
  enableServiceLinks: true

  # ServiceMonitor configuration
  serviceMonitor:
    # -- If enabled, ServiceMonitor resources for Prometheus Operator are created
    enabled: true
    # -- Alternative namespace for ServiceMonitor resources
    namespace: null
    # -- Namespace selector for ServiceMonitor resources
    namespaceSelector: {}
    # -- ServiceMonitor annotations
    annotations: {}
    # -- Additional ServiceMonitor labels
    labels: {}
    # -- ServiceMonitor scrape interval
    interval: null
    # -- ServiceMonitor scrape timeout in Go duration format (e.g. 15s)
    scrapeTimeout: null
    # -- ServiceMonitor relabel configs to apply to samples before scraping
    # https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#relabelconfig
    # (defines `relabel_configs`)
    relabelings: []
    # -- ServiceMonitor relabel configs to apply to samples as the last
    # step before ingestion
    # https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#relabelconfig
    # (defines `metric_relabel_configs`)
    metricRelabelings: []
    # --ServiceMonitor will add labels from the service to the Prometheus metric
    # https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#servicemonitorspec
    targetLabels: []
    # -- ServiceMonitor will use http by default, but you can pick https as well
    scheme: http
    # -- ServiceMonitor will use these tlsConfig settings to make the health check requests
    tlsConfig: null
    # -- Prometheus rules will be deployed for alerting purposes
    prometheusRule:
      enabled: false
      additionalLabels: {}
      # namespace:
      rules: []
      #  - alert: PromtailRequestErrors
      #    expr: 100 * sum(rate(promtail_request_duration_seconds_count{status_code=~"5..|failed"}[1m])) by (namespace, job, route, instance) / sum(rate(promtail_request_duration_seconds_count[1m])) by (namespace, job, route, instance) > 10
      #    for: 5m
      #    labels:
      #      severity: critical
      #    annotations:
      #      description: |
      #        The {{ $labels.job }} {{ $labels.route }} is experiencing
      #        {{ printf \"%.2f\" $value }} errors.
      #        VALUE = {{ $value }}
      #        LABELS = {{ $labels }}
      #      summary: Promtail request errors (instance {{ $labels.instance }})
      #  - alert: PromtailRequestLatency
      #    expr: histogram_quantile(0.99, sum(rate(promtail_request_duration_seconds_bucket[5m])) by (le)) > 1
      #    for: 5m
      #    labels:
      #      severity: critical
      #    annotations:
      #      summary: Promtail request latency (instance {{ $labels.instance }})
      #      description: |
      #        The {{ $labels.job }} {{ $labels.route }} is experiencing
      #        {{ printf \"%.2f\" $value }}s 99th percentile latency.
      #        VALUE = {{ $value }}
      #        LABELS = {{ $labels }}

  # Extra containers created as part of a Promtail Deployment resource
  # - spec for Container:
  #   https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core
  #
  # Note that the key is used as the `name` field, i.e. below will create a
  # container named `promtail-proxy`.
  extraContainers: {}
    # promtail-proxy:
    #   image: nginx
    #   ...

  # -- Configure additional ports and services. For each configured port, a corresponding service is created.
  # See values.yaml for details
  extraPorts: {}
  #  syslog:
  #    name: tcp-syslog
  #    containerPort: 1514
  #    protocol: TCP
  #    service:
  #      type: ClusterIP
  #      clusterIP: null
  #      port: 1514
  #      externalIPs: []
  #      nodePort: null
  #      annotations: {}
  #      labels: {}
  #      loadBalancerIP: null
  #      loadBalancerSourceRanges: []
  #      externalTrafficPolicy: null

  # -- PodSecurityPolicy configuration.
  # @default -- See `values.yaml`
  podSecurityPolicy:
    privileged: true
    allowPrivilegeEscalation: true
    volumes:
      - 'secret'
      - 'hostPath'
      - 'downwardAPI'
    hostNetwork: false
    hostIPC: false
    hostPID: false
    runAsUser:
      rule: 'RunAsAny'
    seLinux:
      rule: 'RunAsAny'
    supplementalGroups:
      rule: 'RunAsAny'
    fsGroup:
      rule: 'RunAsAny'
    readOnlyRootFilesystem: true
    requiredDropCapabilities:
      - ALL

  # -- Section for crafting Promtails config file. The only directly relevant value is `config.file`
  # which is a templated string that references the other values and snippets below this key.
  # @default -- See `values.yaml`
  config:
    # -- The log level of the Promtail server
    # Must be reference in `config.file` to configure `server.log_level`
    # See default config in `values.yaml`
    logLevel: info
    # -- The port of the Promtail server
    # Must be reference in `config.file` to configure `server.http_listen_port`
    # See default config in `values.yaml`
    serverPort: 3101
    # -- The config of clients of the Promtail server
    # Must be reference in `config.file` to configure `clients`
    # @default -- See `values.yaml`
    clients:
      - url: https://loki.sendilab.net/loki/api/v1/push
    # -- A section of reusable snippets that can be reference in `config.file`.
    # Custom snippets may be added in order to reduce redundancy.
    # This is especially helpful when multiple `kubernetes_sd_configs` are use which usually have large parts in common.
    # @default -- See `values.yaml`
    snippets:
      pipelineStages:
        - cri: {}
        - replace:
            expression: '(access_token=[a-z0-9]*)'
            replace: 'access_token=[redacted]'
        - replace:
            expression: '(acccess_key=[a-z0-9]*)'
            replace: 'acccess_key=[redacted]'
        - replace:
            expression: '(api_key=[a-z0-9]*)'
            replace: 'api_key=[redacted]'
        - replace:
            expression: '(user_email=[\w\.=-]+%40[\w\.-]+\.[\w]{2,64})'
            replace: 'user_email=[redacted]'
        - replace:
            expression: '(password=\\"[^&" ]*)'
            replace: 'password=\"[redacted]\'
      common:
        - action: replace
          source_labels:
            - __meta_kubernetes_pod_node_name
          target_label: node_name
        - action: replace
          source_labels:
            - __meta_kubernetes_namespace
          target_label: namespace
        - action: replace
          replacement: $1
          separator: /
          source_labels:
            - namespace
            - app
          target_label: job
        - action: replace
          source_labels:
            - __meta_kubernetes_pod_name
          target_label: pod
        - action: replace
          source_labels:
            - __meta_kubernetes_pod_container_name
          target_label: container
        - action: replace
          replacement: /var/log/pods/*$1/*.log
          separator: /
          source_labels:
            - __meta_kubernetes_pod_uid
            - __meta_kubernetes_pod_container_name
          target_label: __path__
        - action: replace
          replacement: /var/log/pods/*$1/*.log
          regex: true/(.*)
          separator: /
          source_labels:
            - __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
            - __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
            - __meta_kubernetes_pod_container_name
          target_label: __path__

      # If set to true, adds an additional label for the scrape job.
      # This helps debug the Promtail config.
      addScrapeJobLabel: false

      # -- You can put here any keys that will be directly added to the config file's 'limits_config' block.
      # @default -- empty
      extraLimitsConfig: ""

      # -- You can put here any keys that will be directly added to the config file's 'server' block.
      # @default -- empty
      extraServerConfigs: ""

      # -- You can put here any additional scrape configs you want to add to the config file.
      # @default -- empty
      extraScrapeConfigs: |
        - job_name: kubernetes-pods-testenv
          pipeline_stages:
            {{- toYaml .Values.config.snippets.pipelineStages | nindent 4 }}
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - action: replace
              source_labels:
                - __meta_kubernetes_pod_label_testenv
              target_label: testenv
            - action: drop
              regex: ''
              source_labels:
                - testenv
            - action: replace
              source_labels:
                - __meta_kubernetes_pod_label_app_kubernetes_io_component
              target_label: component
            {{- if .Values.config.snippets.addScrapeJobLabel }}
            - action: replace
              replacement: kubernetes-pods-testenv
              target_label: scrape_job
            {{- end }}
            {{- toYaml .Values.config.snippets.common | nindent 4 }}

      # -- You can put here any additional relabel_configs to "kubernetes-pods" job
      extraRelabelConfigs: []

      scrapeConfigs: |
        # See also https://github.com/grafana/loki/blob/master/production/ksonnet/promtail/scrape_config.libsonnet for reference
        - job_name: kubernetes-pods
          pipeline_stages:
            {{- toYaml .Values.config.snippets.pipelineStages | nindent 4 }}
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - source_labels:
                - __meta_kubernetes_pod_controller_name
              regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
              action: replace
              target_label: __tmp_controller_name
            - source_labels:
                - __meta_kubernetes_pod_label_app_kubernetes_io_name
                - __meta_kubernetes_pod_label_app
                - __tmp_controller_name
                - __meta_kubernetes_pod_name
              regex: ^;*([^;]+)(;.*)?$
              action: replace
              target_label: app
            - source_labels:
                - __meta_kubernetes_pod_label_app_kubernetes_io_instance
                - __meta_kubernetes_pod_label_release
              regex: ^;*([^;]+)(;.*)?$
              action: replace
              target_label: instance
            - source_labels:
                - __meta_kubernetes_pod_label_app_kubernetes_io_component
                - __meta_kubernetes_pod_label_component
              regex: ^;*([^;]+)(;.*)?$
              action: replace
              target_label: component
            {{- if .Values.config.snippets.addScrapeJobLabel }}
            - replacement: kubernetes-pods
              target_label: scrape_job
            {{- end }}
            {{- toYaml .Values.config.snippets.common | nindent 4 }}
            {{- with .Values.config.snippets.extraRelabelConfigs }}
            {{- toYaml . | nindent 4 }}
            {{- end }}

    # -- Config file contents for Promtail.
    # Must be configured as string.
    # It is templated so it can be assembled from reusable snippets in order to avoid redundancy.
    # @default -- See `values.yaml`
    file: |
      server:
        log_level: {{ .Values.config.logLevel }}
        http_listen_port: {{ .Values.config.serverPort }}
        {{- with .Values.httpPathPrefix }}
        http_path_prefix: {{ . }}
        {{- end }}
        {{- tpl .Values.config.snippets.extraServerConfigs . | nindent 2 }}

      clients:
        - url: https://loki.sendilab.net/loki/api/v1/push
          basic_auth:
            username: <vault:devops/data/loki/basicAuth~username>
            password: <vault:devops/data/loki/basicAuth~password>

      positions:
        filename: /run/promtail/positions.yaml

      scrape_configs:
        {{- tpl .Values.config.snippets.scrapeCpromtail:
  # -- Overrides the chart's name
  nameOverride: null

  # -- Overrides the chart's computed fullname
  fullnameOverride: null

  daemonset:
    # -- Deploys Promtail as a DaemonSet
    enabled: true

  deployment:
    # -- Deploys Promtail as a Deployment
    enabled: false
    replicaCount: 1
    autoscaling:
      # -- Creates a HorizontalPodAutoscaler for the deployment
      enabled: false
      minReplicas: 1
      maxReplicas: 10
      targetCPUUtilizationPercentage: 80
      targetMemoryUtilizationPercentage:

  ingress:
    enabled: false
    # For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
    # See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
    # ingressClassName: nginx
    # Values can be templated
    annotations: {}
      # kubernetes.io/ingress.class: nginx
      # kubernetes.io/tls-acme: "true"
    labels: {}
    path: /

    # pathType is only for k8s >= 1.1=
    pathType: Prefix

    hosts:
      - chart-example.local
    ## Extra paths to prepend to every host configuration. This is useful when working with annotation based services.
    extraPaths: []
    # - path: /*
    #   backend:
    #     serviceName: ssl-redirect
    #     servicePort: use-annotation
    ## Or for k8s > 1.19
    # - path: /*
    #   pathType: Prefix
    #   backend:
    #     service:
    #       name: ssl-redirect
    #       port:
    #         name: use-annotation

    tls: []
    #  - secretName: chart-example-tls
    #    hosts:
    #      - chart-example.local
  secret:
    # -- Labels for the Secret
    labels: {}
    # -- Annotations for the Secret
    annotations: {}

  configmap:
    # -- If enabled, promtail config will be created as a ConfigMap instead of a secret
    enabled: false

  initContainer: []
    # # -- Specifies whether the init container for setting inotify max user instances is to be enabled
    # - name: init
    #   # -- Docker registry, image and tag for the init container image
    #   image: docker.io/busybox:1.33
    #   # -- Docker image pull policy for the init container image
    #   imagePullPolicy: IfNotPresent
    #   # -- The inotify max user instances to configure
    #   command:
    #     - sh
    #     - -c
    #     - sysctl -w fs.inotify.max_user_instances=128
    #   securityContext:
    #     privileged: true

  image:
    # -- The Docker registry
    registry: docker.sendilab.net/proxy
    # -- Docker image repository
    repository: grafana/promtail
    # -- Overrides the image tag whose default is the chart's appVersion
    tag: null
    # -- Docker image pull policy
    pullPolicy: Always

  # -- Image pull secrets for Docker images
  imagePullSecrets: []

  # -- Annotations for the DaemonSet
  annotations: {}

  # -- The update strategy for the DaemonSet
  updateStrategy: {}

  # -- Pod labels
  podLabels: {}

  # -- Pod annotations
  podAnnotations: {}
  #  prometheus.io/scrape: "true"
  #  prometheus.io/port: "http-metrics"

  # -- The name of the PriorityClass
  priorityClassName: null

  # -- Liveness probe
  livenessProbe: {}

  # -- Readiness probe
  # @default -- See `values.yaml`
  readinessProbe:
    failureThreshold: 5
    httpGet:
      path: "{{ printf `%s/ready` .Values.httpPathPrefix }}"
      port: http-metrics
    initialDelaySeconds: 10
    periodSeconds: 10
    successThreshold: 1
    timeoutSeconds: 1

  # -- Resource requests and limits
  resources:
    limits:
      cpu: 200m
      memory: 200Mi
    requests:
      cpu: 100m
      memory: 200Mi

  # -- The security context for pods
  podSecurityContext:
    runAsUser: 0
    runAsGroup: 0

  # -- The security context for containers
  containerSecurityContext:
    readOnlyRootFilesystem: true
    capabilities:
      drop:
        - ALL
    allowPrivilegeEscalation: false

  rbac:
    # -- Specifies whether RBAC resources are to be created
    create: true
    # -- Specifies whether a PodSecurityPolicy is to be created
    pspEnabled: false

  # -- The name of the Namespace to deploy
  # If not set, `.Release.Namespace` is used
  namespace: null

  serviceAccount:
    # -- Specifies whether a ServiceAccount should be created
    create: true
    # -- The name of the ServiceAccount to use.
    # If not set and `create` is true, a name is generated using the fullname template
    name: null
    # -- Image pull secrets for the service account
    imagePullSecrets: []
    # -- Annotations for the service account
    annotations: {}

  # -- Node selector for pods
  nodeSelector: {}

  # -- Affinity configuration for pods
  affinity: {}

  # -- Tolerations for pods. By default, pods will be scheduled on master/control-plane nodes.
  tolerations:
    - key: node-role.kubernetes.io/master
      operator: Exists
      effect: NoSchedule
    - key: node-role.kubernetes.io/control-plane
      operator: Exists
      effect: NoSchedule

  # -- Default volumes that are mounted into pods. In most cases, these should not be changed.
  # Use `extraVolumes`/`extraVolumeMounts` for additional custom volumes.
  # @default -- See `values.yaml`
  defaultVolumes:
    - name: run
      hostPath:
        path: /run/promtail
    - name: containers
      hostPath:
        path: /var/lib/docker/containers
    - name: pods
      hostPath:
        path: /var/log/pods

  # -- Default volume mounts. Corresponds to `volumes`.
  # @default -- See `values.yaml`
  defaultVolumeMounts:
    - name: run
      mountPath: /run/promtail
    - name: containers
      mountPath: /var/lib/docker/containers
      readOnly: true
    - name: pods
      mountPath: /var/log/pods
      readOnly: true

  # Extra volumes to be added in addition to those specified under `defaultVolumes`.
  extraVolumes: []

  # Extra volume mounts together. Corresponds to `extraVolumes`.
  extraVolumeMounts: []

  # Extra args for the Promtail container.
  extraArgs: []
  # -- Example:
  # -- extraArgs:
  # --   - -client.external-labels=hostname=$(HOSTNAME)

  # -- Extra environment variables
  extraEnv: []

  # -- Extra environment variables from secrets or configmaps
  extraEnvFrom: []

  # -- Configure enableServiceLinks in pod
  enableServiceLinks: true

  # ServiceMonitor configuration
  serviceMonitor:
    # -- If enabled, ServiceMonitor resources for Prometheus Operator are created
    enabled: true
    # -- Alternative namespace for ServiceMonitor resources
    namespace: null
    # -- Namespace selector for ServiceMonitor resources
    namespaceSelector: {}
    # -- ServiceMonitor annotations
    annotations: {}
    # -- Additional ServiceMonitor labels
    labels: {}
    # -- ServiceMonitor scrape interval
    interval: null
    # -- ServiceMonitor scrape timeout in Go duration format (e.g. 15s)
    scrapeTimeout: null
    # -- ServiceMonitor relabel configs to apply to samples before scraping
    # https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#relabelconfig
    # (defines `relabel_configs`)
    relabelings: []
    # -- ServiceMonitor relabel configs to apply to samples as the last
    # step before ingestion
    # https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#relabelconfig
    # (defines `metric_relabel_configs`)
    metricRelabelings: []
    # --ServiceMonitor will add labels from the service to the Prometheus metric
    # https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#servicemonitorspec
    targetLabels: []
    # -- ServiceMonitor will use http by default, but you can pick https as well
    scheme: http
    # -- ServiceMonitor will use these tlsConfig settings to make the health check requests
    tlsConfig: null
    # -- Prometheus rules will be deployed for alerting purposes
    prometheusRule:
      enabled: false
      additionalLabels: {}
      # namespace:
      rules: []
      #  - alert: PromtailRequestErrors
      #    expr: 100 * sum(rate(promtail_request_duration_seconds_count{status_code=~"5..|failed"}[1m])) by (namespace, job, route, instance) / sum(rate(promtail_request_duration_seconds_count[1m])) by (namespace, job, route, instance) > 10
      #    for: 5m
      #    labels:
      #      severity: critical
      #    annotations:
      #      description: |
      #        The {{ $labels.job }} {{ $labels.route }} is experiencing
      #        {{ printf \"%.2f\" $value }} errors.
      #        VALUE = {{ $value }}
      #        LABELS = {{ $labels }}
      #      summary: Promtail request errors (instance {{ $labels.instance }})
      #  - alert: PromtailRequestLatency
      #    expr: histogram_quantile(0.99, sum(rate(promtail_request_duration_seconds_bucket[5m])) by (le)) > 1
      #    for: 5m
      #    labels:
      #      severity: critical
      #    annotations:
      #      summary: Promtail request latency (instance {{ $labels.instance }})
      #      description: |
      #        The {{ $labels.job }} {{ $labels.route }} is experiencing
      #        {{ printf \"%.2f\" $value }}s 99th percentile latency.
      #        VALUE = {{ $value }}
      #        LABELS = {{ $labels }}

  # Extra containers created as part of a Promtail Deployment resource
  # - spec for Container:
  #   https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core
  #
  # Note that the key is used as the `name` field, i.e. below will create a
  # container named `promtail-proxy`.
  extraContainers: {}
    # promtail-proxy:
    #   image: nginx
    #   ...

  # -- Configure additional ports and services. For each configured port, a corresponding service is created.
  # See values.yaml for details
  extraPorts: {}
  #  syslog:
  #    name: tcp-syslog
  #    containerPort: 1514
  #    protocol: TCP
  #    service:
  #      type: ClusterIP
  #      clusterIP: null
  #      port: 1514
  #      externalIPs: []
  #      nodePort: null
  #      annotations: {}
  #      labels: {}
  #      loadBalancerIP: null
  #      loadBalancerSourceRanges: []
  #      externalTrafficPolicy: null

  # -- PodSecurityPolicy configuration.
  # @default -- See `values.yaml`
  podSecurityPolicy:
    privileged: true
    allowPrivilegeEscalation: true
    volumes:
      - 'secret'
      - 'hostPath'
      - 'downwardAPI'
    hostNetwork: false
    hostIPC: false
    hostPID: false
    runAsUser:
      rule: 'RunAsAny'
    seLinux:
      rule: 'RunAsAny'
    supplementalGroups:
      rule: 'RunAsAny'
    fsGroup:
      rule: 'RunAsAny'
    readOnlyRootFilesystem: true
    requiredDropCapabilities:
      - ALL

  # -- Section for crafting Promtails config file. The only directly relevant value is `config.file`
  # which is a templated string that references the other values and snippets below this key.
  # @default -- See `values.yaml`
  config:
    # -- The log level of the Promtail server
    # Must be reference in `config.file` to configure `server.log_level`
    # See default config in `values.yaml`
    logLevel: info
    # -- The port of the Promtail server
    # Must be reference in `config.file` to configure `server.http_listen_port`
    # See default config in `values.yaml`
    serverPort: 3101
    # -- The config of clients of the Promtail server
    # Must be reference in `config.file` to configure `clients`
    # @default -- See `values.yaml`
    clients:
      - url: https://loki.sendilab.net/loki/api/v1/push
    # -- A section of reusable snippets that can be reference in `config.file`.
    # Custom snippets may be added in order to reduce redundancy.
    # This is especially helpful when multiple `kubernetes_sd_configs` are use which usually have large parts in common.
    # @default -- See `values.yaml`
    snippets:
      pipelineStages:
        - cri: {}
        - replace:
            expression: '(access_token=[a-z0-9]*)'
            replace: 'access_token=[redacted]'
        - replace:
            expression: '(acccess_key=[a-z0-9]*)'
            replace: 'acccess_key=[redacted]'
        - replace:
            expression: '(api_key=[a-z0-9]*)'
            replace: 'api_key=[redacted]'
        - replace:
            expression: '(user_email=[\w\.=-]+%40[\w\.-]+\.[\w]{2,64})'
            replace: 'user_email=[redacted]'
        - replace:
            expression: '(password=\\"[^&" ]*)'
            replace: 'password=\"[redacted]\'
      common:
        - action: replace
          source_labels:
            - __meta_kubernetes_pod_node_name
          target_label: node_name
        - action: replace
          source_labels:
            - __meta_kubernetes_namespace
          target_label: namespace
        - action: replace
          replacement: $1
          separator: /
          source_labels:
            - namespace
            - app
          target_label: job
        - action: replace
          source_labels:
            - __meta_kubernetes_pod_name
          target_label: pod
        - action: replace
          source_labels:
            - __meta_kubernetes_pod_container_name
          target_label: container
        - action: replace
          replacement: /var/log/pods/*$1/*.log
          separator: /
          source_labels:
            - __meta_kubernetes_pod_uid
            - __meta_kubernetes_pod_container_name
          target_label: __path__
        - action: replace
          replacement: /var/log/pods/*$1/*.log
          regex: true/(.*)
          separator: /
          source_labels:
            - __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
            - __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
            - __meta_kubernetes_pod_container_name
          target_label: __path__

      # If set to true, adds an additional label for the scrape job.
      # This helps debug the Promtail config.
      addScrapeJobLabel: false

      # -- You can put here any keys that will be directly added to the config file's 'limits_config' block.
      # @default -- empty
      extraLimitsConfig: ""

      # -- You can put here any keys that will be directly added to the config file's 'server' block.
      # @default -- empty
      extraServerConfigs: ""

      # -- You can put here any additional scrape configs you want to add to the config file.
      # @default -- empty
      extraScrapeConfigs: |
        - job_name: kubernetes-pods-testenv
          pipeline_stages:
            {{- toYaml .Values.config.snippets.pipelineStages | nindent 4 }}
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - action: replace
              source_labels:
                - __meta_kubernetes_pod_label_testenv
              target_label: testenv
            - action: drop
              regex: ''
              source_labels:
                - testenv
            - action: replace
              source_labels:
                - __meta_kubernetes_pod_label_app_kubernetes_io_component
              target_label: component
            {{- if .Values.config.snippets.addScrapeJobLabel }}
            - action: replace
              replacement: kubernetes-pods-testenv
              target_label: scrape_job
            {{- end }}
            {{- toYaml .Values.config.snippets.common | nindent 4 }}

      # -- You can put here any additional relabel_configs to "kubernetes-pods" job
      extraRelabelConfigs: []

      scrapeConfigs: |
        # See also https://github.com/grafana/loki/blob/master/production/ksonnet/promtail/scrape_config.libsonnet for reference
        - job_name: kubernetes-pods
          pipeline_stages:
            {{- toYaml .Values.config.snippets.pipelineStages | nindent 4 }}
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - source_labels:
                - __meta_kubernetes_pod_controller_name
              regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
              action: replace
              target_label: __tmp_controller_name
            - source_labels:
                - __meta_kubernetes_pod_label_app_kubernetes_io_name
                - __meta_kubernetes_pod_label_app
                - __tmp_controller_name
                - __meta_kubernetes_pod_name
              regex: ^;*([^;]+)(;.*)?$
              action: replace
              target_label: app
            - source_labels:
                - __meta_kubernetes_pod_label_app_kubernetes_io_instance
                - __meta_kubernetes_pod_label_release
              regex: ^;*([^;]+)(;.*)?$
              action: replace
              target_label: instance
            - source_labels:
                - __meta_kubernetes_pod_label_app_kubernetes_io_component
                - __meta_kubernetes_pod_label_component
              regex: ^;*([^;]+)(;.*)?$
              action: replace
              target_label: component
            {{- if .Values.config.snippets.addScrapeJobLabel }}
            - replacement: kubernetes-pods
              target_label: scrape_job
            {{- end }}
            {{- toYaml .Values.config.snippets.common | nindent 4 }}
            {{- with .Values.config.snippets.extraRelabelConfigs }}
            {{- toYaml . | nindent 4 }}
            {{- end }}

    # -- Config file contents for Promtail.
    # Must be configured as string.
    # It is templated so it can be assembled from reusable snippets in order to avoid redundancy.
    # @default -- See `values.yaml`
    file: |
      server:
        log_level: {{ .Values.config.logLevel }}
        http_listen_port: {{ .Values.config.serverPort }}
        {{- with .Values.httpPathPrefix }}
        http_path_prefix: {{ . }}
        {{- end }}
        {{- tpl .Values.config.snippets.extraServerConfigs . | nindent 2 }}

      clients:
        - url: https://loki.sendilab.net/loki/api/v1/push
          basic_auth:
            username: <vault:devops/data/loki/basicAuth~username>
            password: <vault:devops/data/loki/basicAuth~password>

      positions:
        filename: /run/promtail/positions.yaml

      scrape_configs:
        {{- tpl .Values.config.snippets.scrapeConfigs . | nindent 2 }}
        {{- tpl .Values.config.snippets.extraScrapeConfigs . | nindent 2 }}

      limits_config:
        {{- tpl .Values.config.snippets.extraLimitsConfig . | nindent 2 }}

  networkPolicy:
    # -- Specifies whether Network Policies should be created
    enabled: false
    metrics:
      # -- Specifies the Pods which are allowed to access the metrics port.
      # As this is cross-namespace communication, you also neeed the namespaceSelector.
      podSelector: {}
      # -- Specifies the namespaces which are allowed to access the metrics port
      namespaceSelector: {}
      # -- Specifies specific network CIDRs which are allowed to access the metrics port.
      # In case you use namespaceSelector, you also have to specify your kubelet networks here.
      # The metrics ports are also used for probes.
      cidrs: []
    k8sApi:
      # -- Specify the k8s API endpoint port
      port: 8443
      # -- Specifies specific network CIDRs you want to limit access to
      cidrs: []

  # -- Base path to server all API routes fro
  httpPathPrefix: ""

  sidecar:
    configReloader:
      enabled: false
      image:
        # -- The Docker registry for sidecar config-reloader
        registry: docker.io
        # -- Docker image repository for sidecar config-reloader
        repository: jimmidyson/configmap-reload
        # -- Docker image tag for sidecar config-reloader
        tag: v0.8.0
        # -- Docker image pull policy for sidecar config-reloader
        pullPolicy: IfNotPresent
      # Extra args for the config-reloader container.
      extraArgs: []
      # -- Extra environment variables for sidecar config-reloader
      extraEnv: []
      # -- Extra environment variables from secrets or configmaps for sidecar config-reloader
      extraEnvFrom: []
      # -- The security context for containers for sidecar config-reloader
      containerSecurityContext:
        readOnlyRootFilesystem: true
        capabilities:
          drop:
            - ALL
        allowPrivilegeEscalation: false
      # -- Readiness probe for sidecar config-reloader
      readinessProbe: {}
      # -- Liveness probe for sidecar config-reloader
      livenessProbe: {}
      # -- Resource requests and limits for sidecar config-reloader
      resources: {}
      #  limits:
      #    cpu: 200m
      #    memory: 128Mi
      #  requests:
      #    cpu: 100m
      #    memory: 128Mi
      config:
        # -- The port of the config-reloader server
        serverPort: 9533
      serviceMonitor:
        enabled: true

  # -- Extra K8s manifests to deploy
  extraObjects: []
    # - apiVersion: "kubernetes-client.io/v1"
    #   kind: ExternalSecret
    #   metadata:
    #     name: promtail-secrets
    #   spec:
    #     backendType: gcpSecretsManager
    #     data:
    #       - key: promtail-oauth2-creds
    #         name: client_secretonfigs . | nindent 2 }}
        {{- tpl .Values.config.snippets.extraScrapeConfigs . | nindent 2 }}

      limits_config:
        {{- tpl .Values.config.snippets.extraLimitsConfig . | nindent 2 }}

  networkPolicy:
    # -- Specifies whether Network Policies should be created
    enabled: false
    metrics:
      # -- Specifies the Pods which are allowed to access the metrics port.
      # As this is cross-namespace communication, you also neeed the namespaceSelector.
      podSelector: {}
      # -- Specifies the namespaces which are allowed to access the metrics port
      namespaceSelector: {}
      # -- Specifies specific network CIDRs which are allowed to access the metrics port.
      # In case you use namespaceSelector, you also have to specify your kubelet networks here.
      # The metrics ports are also used for probes.
      cidrs: []
    k8sApi:
      # -- Specify the k8s API endpoint port
      port: 8443
      # -- Specifies specific network CIDRs you want to limit access to
      cidrs: []

  # -- Base path to server all API routes fro
  httpPathPrefix: ""

  sidecar:
    configReloader:
      enabled: false
      image:
        # -- The Docker registry for sidecar config-reloader
        registry: docker.io
        # -- Docker image repository for sidecar config-reloader
        repository: jimmidyson/configmap-reload
        # -- Docker image tag for sidecar config-reloader
        tag: v0.8.0
        # -- Docker image pull policy for sidecar config-reloader
        pullPolicy: IfNotPresent
      # Extra args for the config-reloader container.
      extraArgs: []
      # -- Extra environment variables for sidecar config-reloader
      extraEnv: []
      # -- Extra environment variables from secrets or configmaps for sidecar config-reloader
      extraEnvFrom: []
      # -- The security context for containers for sidecar config-reloader
      containerSecurityContext:
        readOnlyRootFilesystem: true
        capabilities:
          drop:
            - ALL
        allowPrivilegeEscalation: false
      # -- Readiness probe for sidecar config-reloader
      readinessProbe: {}
      # -- Liveness probe for sidecar config-reloader
      livenessProbe: {}
      # -- Resource requests and limits for sidecar config-reloader
      resources: {}
      #  limits:
      #    cpu: 200m
      #    memory: 128Mi
      #  requests:
      #    cpu: 100m
      #    memory: 128Mi
      config:
        # -- The port of the config-reloader server
        serverPort: 9533
      serviceMonitor:
        enabled: true

  # -- Extra K8s manifests to deploy
  extraObjects: []
    # - apiVersion: "kubernetes-client.io/v1"
    #   kind: ExternalSecret
    #   metadata:
    #     name: promtail-secrets
    #   spec:
    #     backendType: gcpSecretsManager
    #     data:
    #       - key: promtail-oauth2-creds
    #         name: client_secret
tico24 commented 1 year ago

I have a similar issue, and maybe this insight might help diagnose the problem.

We have noticed that the memory useage on promtail startup is much higher than it was before. However, with high enough resource limits, if you let it settle (in my case, for about 10 minutes), it goes back to a more healthy memory level.

DylanGuedes commented 1 year ago

Indeed, I was monitoring our Promtail infra for the latest weeks and although I see a mem raise, it all gets cleaned periodically, like a GC run. So maybe we can change something on Promtail to have better memory usage, but doesn't seem like a leak per say.

Joibel commented 1 year ago

I don't think it's a leak necessarily, it just exceeds our resource limits where the old one didn't, and ends up in a neverending OOM. Sometimes they survived the inital spike and then lived for around 30 minutes before I reverted to the old one.

I could push the limit higher and live with it.

DylanGuedes commented 1 year ago

I don't think it's a leak necessarily, it just exceeds our resource limits where the old one didn't, and ends up in a neverending OOM. Sometimes they survived the inital spike and then lived for around 30 minutes before I reverted to the old one.

I could push the limit higher and live with it.

Got it, thanks. I think there's a feature that you can use to work around that, I'll search for it and share with you today.

DylanGuedes commented 1 year ago

Found it: https://github.com/grafana/loki/pull/7101 (courtesy of the great @liguozhong :smile: )

On the limits block of Promtail, could you give max_streams a try and give feedback? One thing I noticed is that it isn't documented.