grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
24.05k stars 3.47k forks source link

Improve helm chart documentation for storage in 4.5+ #8524

Open sourcehawk opened 1 year ago

sourcehawk commented 1 year ago

I am a bit lost regarding storage configuration for the latest grafana/loki helm chart. The default values may make it look like a simple task but that is not what I am experiencing. Just trying to configure an S3 bucket using the default values has been a guessing game...

# -- Storage config. Providing this will automatically populate all necessary storage configs in the templated config.
  storage:
    bucketNames:
      chunks: chunks
      ruler: ruler
      admin: admin
    type: s3
    s3:
      s3: null
      endpoint: null
      region: null
      secretAccessKey: null
      accessKeyId: null
      s3ForcePathStyle: false
      insecure: false
      http_config: {}

What should be done?

gkaskonas commented 1 year ago

This is my current configuration for s3. You can (and should) use the same bucket when using boltdb-shipper or tsdb

        storage: {
          bucketNames: {
            chunks: bucket.bucketName,
            ruler: bucket.bucketName,
            admin: bucket.bucketName,
          },
          type: "s3",
          s3: {
            region: "us-east-1",
          },
        },

insecure:false just forces https afaik.

To make sure your objects have encryption, enable it on the bucket

sourcehawk commented 1 year ago

Why is it that when I ask about the values.yaml file noone actually replies with an example for the values.yaml file?

Anyhow, after two days of going bold trying to figure this out, here is how others can use the actual helm chart the way it was set up to be used. Found the value for storage.s3.s3 by pure luck.

loki:
    # Set to false if you don't intend to set 'X-Scope-OrgID' header for the loki datasource
    auth_enabled: true
    # You don't need this if you are not using the auth header 
    querier:
      # I am using one tenant id per cluster
      # This config is for cluster level logging
      # Another centralized logging cluster will use all tenant ids in header and multi `true`
      # See https://grafana.com/docs/loki/latest/operations/multi-tenancy/
      multi_tenant_queries_enabled: false
    # S3 storage configuration
    storage:
      type: s3
      bucketNames: 
        chunks: "<loki-bucketname>"
        ruler: "<loki-bucketname>"
        admin: "<loki-bucketname>"
      s3:
        # Not really sure why this is required at all but it works only if this is provided
        s3: "s3://<loki-bucketname>"
        # Endpoints: https://docs.aws.amazon.com/general/latest/gr/s3.html
        endpoint: "s3.eu-west-1.amazonaws.com"
        # AWS region of bucket
        region: "eu-west-1"
        # Secret access key for a user with bucket permissions
        secretAccessKey: ""
        # Access key for a user with bucket permissions 
        accessKeyId: ""
        # Set to false, multiple posts about problems with it set to true
        s3ForcePathStyle: false
        # We want to use the HTTPS only endpoint so set to false
        insecure: false
        # Our bucket is SSE-3 encrypted
        sse_encryption: true
        # Timeouts etc
        http_config: {}

After getting the configuration to actually work, the loki datasource is available via http://loki-gateway. I added it to grafana with a configmap that looks like this. Note that this requires the grafana sidecar for datasources to be enabled with the label grafana_datasource: "1".

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasource-loki
  labels:
    grafana_datasource: "1"
data:
  datasource.yaml: |-
    apiVersion: 1
    datasources:
      - name: loki
        type: loki
        url: "http://loki-gateway"
        # You only need this if you have auth enabled
        jsonData:
          httpHeaderName1: 'X-Scope-OrgID'
        secureJsonData:
          httpHeaderValue1: '<THE_TENANT_ID_FOR_CLUSTER>'

As for grafana/promtail, this values.yaml configuration did the trick

config:
  clients:
    - url: http://loki-gateway/loki/api/v1/push
      # You only need this if you have auth enabled
      tenant_id: "<THE_TENANT_ID_FOR_CLUSTER>"
sourcehawk commented 1 year ago

I've come to realize that for some reason I am only able to get it to work using the non-https only endpoint i.e s3.eu-west-1.amazonaws.com instead of s3-accesspoint.eu-west-1.amazonaws.com

seanmorton commented 1 year ago

Agreed that S3 documentation is lacking and there appears to be a conflict between the s3 and endpoint S3 properties. Following the advice of https://github.com/grafana/loki/issues/7279#issuecomment-1291488556 was a big part of fixing my problem.

kurtlieu commented 1 year ago

I agree that the documentation can be confusing. My understanding is that chunks S3 bucket will store the indexes and data

Currently, I don't know what ruler or admin is used for or why we need to set them. I haven't seen anything written into those buckets yet on my deployment.

Also, when I deploy the grafana/loki Helm chart, it sets the default number of backend, read, and write pod replicas to 3. It will create additional PVCs for the backend and write pods. Why do I need those additional EBS volumes?

omers commented 1 year ago

+1

Gkirito commented 1 year ago

+1

Angel0r commented 9 months ago

+1

adamcharnock commented 8 months ago

@sourcehawk's example was very useful, thank you! Here is what I ended up with as a series of ansible tasks deploying the helm charts (the values.yaml content can be found under values: for each task, should be fairly obvious). Notes:

I hope this helps some people!

---
- name: Add grafana chart repo
  kubernetes.core.helm_repository:
    name: grafana
    repo_url: "https://grafana.github.io/helm-charts"

- name: Deploy loki
  vars:
    resources:
      limits:
        cpu: 500m
        memory: 256Mi
      requests:
        cpu: 100m
        memory: 64Mi
  kubernetes.core.helm:
    name: loki
    release_namespace: grafana-system
    create_namespace: yes
    chart_ref: grafana/loki
    # https://github.com/grafana/loki/blob/main/production/helm/loki/values.yaml
    # https://github.com/grafana/loki/issues/8524
    values:
      loki:
        storage:
          bucketNames:
            chunks: "{{ b2_loki_bucket   }}"
            ruler: "{{ b2_loki_bucket  }}"
            admin: "{{ b2_loki_bucket  }}"
          type: s3
          s3:
            s3: "s3://{{ b2_loki_bucket  }}"
            endpoint: s3.eu-central-003.backblazeb2.com
            region: eu-central-003
            accessKeyId: "{{ b2_loki_key_id }}"
            secretAccessKey: "{{ b2_loki_secret }}"
            s3ForcePathStyle: true
            insecure: false

        # https://github.com/grafana/loki/issues/4613#issuecomment-1855200860
        limits_config:
          split_queries_by_interval: "1h"
        query_scheduler:
          max_outstanding_requests_per_tenant: 2048

      gateway:
        nginxConfig:
          resolver: coredns.kube-system.svc.cluster.local
        resources: "{{ resources }}"

      monitoring:
        lokiCanary:
          resources:
            limits:
              cpu: 10m
              memory: 64Mi
            requests: {}

        selfMonitoring:
          grafanaAgent:
            resources: "{{ resources }}"

      backend:
        persistence:
          size: "{{ 10 * disk_multiplier|float }}Gi"
        resources: "{{ resources }}"
      read:
        persistence:
          size: "{{ 10 * disk_multiplier|float }}Gi"
        resources: "{{ resources }}"
      singleBinary:
        persistence:
          size: "{{ 10 * disk_multiplier|float }}Gi"
        resources: "{{ resources }}"
      write:
        persistence:
          size: "{{ 10 * disk_multiplier|float }}Gi"
        resources: "{{ resources }}"

- name: Deploy promtail (collects logs and sends them to loki)
  kubernetes.core.helm:
    name: promtail
    release_namespace: grafana-system
    create_namespace: yes
    chart_ref: grafana/promtail
    # https://github.com/grafana/helm-charts/blob/main/charts/promtail/values.yaml
    # https://github.com/grafana/loki/issues/8524
    values:
      config:
        clients:
          - url: http://loki-gateway/loki/api/v1/push
            tenant_id: system

        snippets:
          # https://grafana.com/blog/2022/03/21/how-relabeling-in-prometheus-works/#the-base-relabel_config-block
          extraRelabelConfigs:
          - regex: "(?P<tenant>bb|aa)-.*"
            source_labels:
              - "namespace"
            action: replace
            target_label: tenant

          pipelineStages:
            - cri: {}
            - match:
                selector: '{tenant=~".+"}'
                stages:
                  - tenant:
                      label: "tenant"
            - output:
                 source: message

      resources:
        limits:
          cpu: 500m
          memory: 256Mi
        requests:
          cpu: 50m
          memory: 128Mi

# TODO: ADD CHART VERSIONS TO ALL OF THESE

- name: Deploy grafana
  kubernetes.core.helm:
    name: grafana
    release_namespace: grafana-system
    create_namespace: yes
    chart_ref: grafana/grafana
    # https://github.com/grafana/helm-charts/blob/main/charts/grafana/values.yaml
    # https://github.com/grafana/loki/issues/8524
    values:
      adminUser: "{{ grafana_admin_username }}"
      adminPassword: "{{ grafana_admin_password }}"

      resources:
        limits:
          cpu: 1
          memory: 1Gi
        requests:
          cpu: 100m
          memory: 128Mi

      persistence:
        size: "{{ 10 * disk_multiplier|float }}Gi"

      datasources:
        datasources.yaml:
          apiVersion: 1
          datasources:
          - name: "Logs (System)"
            type: loki
            url: "http://loki-gateway.grafana-system.svc.cluster.local"
            jsonData:
              httpHeaderName1: 'X-Scope-OrgID'
            secureJsonData:
              httpHeaderValue1: 'system'
          - name: "Logs (AA)"
            type: loki
            url: "http://loki-gateway.grafana-system.svc.cluster.local"
            jsonData:
              httpHeaderName1: 'X-Scope-OrgID'
            secureJsonData:
              httpHeaderValue1: 'aa'
          - name: "Logs (BB)"
            type: loki
            url: "http://loki-gateway.grafana-system.svc.cluster.local"
            jsonData:
              httpHeaderName1: 'X-Scope-OrgID'
            secureJsonData:
              httpHeaderValue1: 'bb'
pbsladek commented 7 months ago
 loki:
    storage:
      bucketNames:
        chunks: "{{ b2_loki_bucket   }}"
        ruler: "{{ b2_loki_bucket  }}"
        admin: "{{ b2_loki_bucket  }}"
      type: s3
      s3:
        s3: "s3://{{ b2_loki_bucket  }}"
        endpoint: s3.eu-central-003.backblazeb2.com
        region: eu-central-003
        accessKeyId: "{{ b2_loki_key_id }}"
        secretAccessKey: "{{ b2_loki_secret }}"
        s3ForcePathStyle: true
        insecure: false

This example was very helpful. our initial setup was missing the s3: "s3://{{ b2_loki_bucket }}". Writes worked fine to the bucket but we were unable to query logs older than 2-3hrs.

The endpoint had been configured as endpoint: s3.amazonaws.com/xxxx-loki which put everything under an extra path vs root of the s3 bucket. Also watch out for the snake vs camel casing in the helm chart. quite confusing.

The format that works now is (loki v2.9.6 chart v5.47.2):

 loki:
    storage:
      bucketNames:
        chunks: "{{ loki_bucket }}"
        ruler: "{{ loki_bucket }}"
        admin: "{{ loki_bucket }}"
      type: s3
      s3:
        s3: "s3://xxxxx-loki"
        endpoint: s3.us-east-1.amazonaws.com
        region: us-east-1
        accessKeyId: "{{ loki_key_id }}"
        secretAccessKey: "{{ loki_secret }}"
        s3ForcePathStyle: true
        insecure: false
JuroOravec commented 7 months ago

@pbsladek to configure the extra path, did you set it for the s3 value, or the endpoint? Or both?


Update: To set up bucket subpath (path prefix), I had to append it to the s3.endpoint field. In my case the s3.s3 did nothing, whether it was enabled or not. So my config looked like:

loki:
  storage:
    bucketNames:
      chunks: "{{ loki_bucket }}"
      ruler: "{{ loki_bucket }}"
      admin: "{{ loki_bucket }}"
    type: s3
    s3:
      # s3: "s3://xxxxx-loki"
      endpoint: s3.us-east-1.amazonaws.com/path/prefix
      region: us-east-1
      accessKeyId: "{{ loki_key_id }}"
      secretAccessKey: "{{ loki_secret }}"
      s3ForcePathStyle: true
      insecure: false

However, when I did so, I got an error about an invalid s3.region. It was telling me that it needs to be us-east-1, even when it already WAS set to that region.

I also tried debugging the S3 endpoint / bucket / path combo with s3-proxy. s3-proxy allows to set up webhooks that are triggered on requests. The webhook event contains info about which S3 path was requested. I've set it up to send the webhook event to http-echo service so the S3 path data was printed to the logs.

However, it seems s3-proxy doesn't handle multipart uploads, or there was some kind of error along those lines.

So in my case I didn't manage to set the bucket subpath, and instead decided to just go with another bucket.

tkcontiant commented 5 months ago

Hello Folks,

Welcome to the party 😄

I am using the latest helm_chart for deploymentMode: SimpleScalable; they have added some information.

To me, this looks like, three buckets are required, however admin bucket is optional if you are using Enterprise.

storage:
    # Loki requires a bucket for chunks and the ruler. GEL requires a third bucket for the admin API.
    # Please provide these values if you are using object storage.
    # bucketNames:
    #   chunks: FIXME
    #   ruler: FIXME
    #   admin: FIXME

I think the second section, is semy optional. For example, if you are not able to resolve the region automatically or your S3 endpoint is custom.. you have to configure this. Also, don't touch it.

In my case, I am using a VPC s3 endpoint, with network routing.. So I don't touch the endpoint. I have configured only the region.

However, I still have some doubts, about whether this is correct, and which S3 bucket I have to provide in s3.s3 : "" since we are using 3 buckets already...

s3:
      s3: null
      endpoint: null
      region: null
      secretAccessKey: null
      accessKeyId: null
      signatureVersion: null
      s3ForcePathStyle: false
      insecure: false