grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.4k stars 3.39k forks source link

[Helm] Helm test requires self monitoring to be enabled #7625

Open jasperjonker opened 1 year ago

jasperjonker commented 1 year ago

Describe the bug I cannot create a helm template with loki with version > 3.2.2. As this is the way ArgoCD deploys applications, I cannot deply Loki with Chart version > 3.2.2 using Helm E.g.:

Chart.yaml

apiVersion: v2
name: loki
version: 3.3.2
dependencies:
  - name: loki
    version: 3.3.2
    repository: https://grafana.github.io/helm-charts

values.yaml

loki:
  loki:
    auth_enabled: false

    schemaConfig:
      configs:
      - from: 2020-10-24
        store: boltdb-shipper
        object_store: gcs
        schema: v12
        index:
          prefix: index_
          period: 24h

    storage_config:
      boltdb_shipper:
        active_index_directory: /var/loki/index
        cache_location: /var/loki/boltdb-cache
        cache_ttl: 24h         # Can be increased for faster performance over longer query periods, uses more disk space
        shared_store: gcs
      gcs:
        bucket_name: loki

    storage:
      bucketNames:
        chunks: loki_chunks
        ruler: loki_ruler
        admin: loki_admin
      type: gcs

    memcached:
      chunk_cache:
        enabled: true
        host: "memcached-loki.loki"
        service: memcache
        batch_size: 1024
        parallelism: 100
      results_cache:
        enabled: true
        host: "memcached-loki.loki"
        service: memcache
        timeout: "500ms"
        default_validity: "12h"

    rulerConfig:
      storage:
        type: local
        local:
          directory: "/tmp/rules"
      rule_path: /tmp/scratch
      alertmanager_url: http://prometheus-infra-alertmanager.prometheus:80
      ring:
        kvstore:
          store: inmemory
      enable_api: true
      enable_alertmanager_v2: true

    # ---------------------
    # This section below is added because loki sometimes throws an error "too many outstanding requests", see https://github.com/grafana/loki/issues/4613
    # This should solve that
    query_scheduler:
      max_outstanding_requests_per_tenant: 2048

    limits_config:
      max_query_series: 5000

  rules:
    additionalGroups:
    - name: additional-loki-rules
      rules:
        - record: job:loki_request_duration_seconds_bucket:sum_rate
          expr: sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job)
        - record: job_route:loki_request_duration_seconds_bucket:sum_rate
          expr: sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job, route)
        - record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate
          expr: sum(rate(container_cpu_usage_seconds_total[1m])) by (node, namespace, pod, container)

  selfMonitoring:
    enabled: false

  ingress:
    # We use the Gateway
    enabled: false

  read:
    autoscaling:
      enabled: true
      minReplicas: 2
      maxReplicas: 5

    persistence:
      storageClass: premium-rwo

  write:
    nodeSelector:
      iam.gke.io/gke-metadata-server-enabled: "true"

    persistence:
      storageClass: premium-rwo

  monitoring:
    selfMonitoring:
      enabled: false

  gateway:
    enabled: true
    autoscaling:
      enabled: true
      maxReplicas: 5
    ingress:
      enabled: true
      hosts:
        - host: "loki.xxx.com"
          paths:
            - path: /
              pathType: ImplementationSpecific
      tls:
        - hosts:
            - loki.xxx.com
          secretName: tls-loki
      ingressClassName: nginx

To Reproduce Steps to reproduce the behavior:

  1. Place the Chart.yaml and values.yaml in a folder.
  2. Run helm dependency build && helm template --debug . -f values.yaml > all.yaml && rm -rf Chart.lock charts
  3. If version in Chart.yaml is > 3.2.2 it will fail with:
    
    Update Complete. ⎈Happy Helming!⎈
    Saving 1 charts
    Downloading loki from repo https://grafana.github.io/helm-charts
    Deleting outdated charts
    install.go:173: [debug] Original chart version: ""
    install.go:190: [debug] CHART PATH: /home/xxx//loki

Error: template: loki/charts/loki/templates/validate.yaml:12:4: executing "loki/charts/loki/templates/validate.yaml" at <fail "Helm test requires self monitoring to be enabled">: error calling fail: Helm test requires self monitoring to be enabled helm.go:81: [debug] template: loki/charts/loki/templates/validate.yaml:12:4: executing "loki/charts/loki/templates/validate.yaml" at <fail "Helm test requires self monitoring to be enabled">: error calling fail: Helm test requires self monitoring to be enabled



**Expected behavior**
When the version is `3.2.2` or below, it creates a file called `all.yaml` with the whole manifest of loki. This can be deployed using `kubectl apply -f all.yaml`

**Environment:**
 - Infrastructure: kubernetes
 - Deployment tool: helm
AurimasNav commented 1 year ago

I'm not sure what helm test does (or where to read about it), but if you are disabling selfMonitoring, maybe you should also disable tests?

test:
  enabled: false
slushysnowman commented 1 year ago

We're hitting this as well - the 'solution' is to disabled 'test' as @AurimasNav says, but it feels a bit wrong.

If the test relies on:

selfMonitoring:
  enabled: true

Then shouldn't that value being set to false also diable that specific test?

rufreakde commented 10 months ago

I'm not sure what helm test does (or where to read about it), but if you are disabling selfMonitoring, maybe you should also disable tests?

test:
  enabled: false

Disabling validation checks should not be the solution there. The Loki chart providers would need to make the self monitoring more configurable...

I mean why is the chart delivering Prometheus CRDs... srsly

dlahn commented 7 months ago

Any update here?

slyt commented 5 months ago

I ran into this same issue with Loki Helm chart 5.5.2 (Loki version 2.8.2).

The CRD's from Loki helm chart are conflicting with the CRD's installed by kube-prometheus-stack, causing a race condition if they're both applied at the same time.

I've disabled the CRD's from Loki by setting monitoring.selfmonitoring.grafanaAgent.installOperator: false but with selfMonitoring.enabled: true (default) it fails to apply the chart because these CRD's are required:

monitoring.grafana.com/v1alpha1/PodLogs
monitoring.grafana.com/v1alpha1/GrafanaAgent
monitoring.grafana.com/v1alpha1/LogsInstance

Since Prometheus can monitor Loki, I figured it is safe to set selfMonitoring.enabled: false, but now I receive the error that others have mentioned (loki/templates/validate.yaml:6:4): Helm test requires self monitoring to be enabled. I get this error when using the most recent Loki chart version, 5.47.2

Edit: It looks like the only helm test implemented is based on the Loki canary which is part of the self-monitoring: https://github.com/grafana/loki/blob/main/production/helm/loki/templates/tests/test-canary.yaml