grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.94k stars 3.45k forks source link

Simple Scalable loki pods are in CrashLoopBackOff. #12972

Open sunidhi271 opened 6 months ago

sunidhi271 commented 6 months ago

Describe the bug Simple Scalable loki pods are in CrashLoopBackOff.

To Reproduce Steps to reproduce the behavior:

  1. helm repo add grafana https://grafana.github.io/helm-charts
  2. helm repo update

    create a proxy helm chart

  3. helm create loki
  4. Update the chart.yaml with the following -
    
    apiVersion: v2
    name: loki-promtail
    description: A Helm chart for Kubernetes
    type: application
    version: 0.1.0
    appVersion: "1.16.0"

dependencies:

loki version 3.0.0

  1. update the helm chart with loki chart helm dependency update .

  2. Update the value file with the below values: cat values.yaml

    
    ##Default value file
    loki:
    global:
    image:
      registry: registry.xyz.com
    fullnameOverride: loki
    imagePullSecrets: [dacsecret]
    test:
    enabled: false
    gateway:
    enabled: false
    lokiCanary:
    enabled: false
    monitoring:
    selfMonitoring:
      grafanaAgent:
        installOperator: false

SimpleScalable Mode related values

deploymentMode: SimpleScalable sidecar: image: repository: registry.xyz.com/public/kiwigrid/k8s-sidecar tag: 1.24.3 resources: limits: cpu: 100m memory: 100Mi requests: cpu: 50m memory: 50Mi rules: enabled: true label: loki_rule labelValue: "" folder: /rules memcached: image: repository: registry.xyz.com/public/memcached

default# tag: 1.6.23-alpine

  tag: 1.6.25

memcachedExporter: image: repository: registry.xyz.com/public/prom/memcached-exporter tag: v0.14.2 minio: enabled: true image: repository: registry.xyz.com/public/minio mcImage: repository: registry.xyz.com/public/quay.io/minio/mc backend: replicas: 3 autoscaling: enabled: false minReplicas: 3 maxReplicas: 6 persistence: volumeClaimsEnabled: true

-- Parameters used for the data volume when volumeClaimEnabled if false

  dataVolumeParameters:
    emptyDir: {}
  # -- Enable StatefulSetAutoDeletePVC feature
  enableStatefulSetAutoDeletePVC: false
  size: 10Gi
  storageClass: "rook-block"
  # -- Selector for persistent disk
  selector: null
resources:
  limits:
    memory: 50Gi
  requests:
    memory: 1Gi

read: replicas: 3 autoscaling: enabled: false minReplicas: 3 maxReplicas: 6 targetCPUUtilizationPercentage: 60 persistence: volumeClaimsEnabled: true size: 10Gi storageClass: rook-block resources: limits: memory: 50Gi requests: memory: 1Gi write: replicas: 3 autoscaling: enabled: false minReplicas: 3 maxReplicas: 6 targetCPUUtilizationPercentage: 60 resources: {} persistence: volumeClaimsEnabled: true

-- Parameters used for the data volume when volumeClaimEnabled if false

  dataVolumeParameters:
    emptyDir: {}
  enableStatefulSetAutoDeletePVC: false
  size: 10Gi
  storageClass: "rook-block"
  selector: null
resources:
  limits:
    memory: 50Gi
  requests:
    memory: 1Gi

tableManager: enabled: false extraVolumes:

structuredConfig:

config: |
  auth_enabled: false
  limits_config:
    ingestion_rate_strategy: local
    reject_old_samples: true
    reject_old_samples_max_age: 168h
    ingestion_rate_mb: 400
    ingestion_burst_size_mb: 600
    max_global_streams_per_user: 10000
    max_query_length: 72h
    max_query_parallelism: 64
    cardinality_limit: 200000
    split_queries_by_interval: 30m
schemaConfig:
  configs:
    - from: 2024-04-01
      object_store: s3
      store: tsdb
      schema: v13
      index:
        prefix: index_
        period: 24h
auth_enabled: false
commonConfig:
  replication_factor: 1

7. Take out the templates and deploy it to kubernetes cluster:
helm template --name-template=loki --namespace=loki . > out.yaml
kubectl apply -f out.yaml 

**Expected behavior**
All pods should be running state and minio pvc should have got mounted. But below is the status:
![image](https://github.com/grafana/loki/assets/86234399/b1f75796-8758-476b-be00-b2a45c02aebd)

![image](https://github.com/grafana/loki/assets/86234399/bad651c4-d284-49d1-8280-3a83685290d5)

**Environment:**
 - Infrastructure: Kubernetes {v1.21.9}
 - Deployment mode: Manual

**Screenshots, Promtail config, or terminal output**
If applicable, add any output to help explain your problem.
![image](https://github.com/grafana/loki/assets/86234399/ec8fa7d8-e0c1-4a0e-a061-ddd3cf38ba9f)
sunidhi271 commented 6 months ago

@elchenberg adthonb JStickler TsengSR alexandreMegel Could any of you help here ?

sunidhi271 commented 6 months ago

With this value file configuration at least the pods are now in running state:

###Promtail Values###
promtail:
  fullnameOverride: promtail
  image:
    registry: registry.dac.nokia.com
    repository: public/grafana/promtail
  config:
    clients:
      - url: http://loki-gateway/loki/api/v1/push
  extraPorts:
    syslog:
      name: promtail
      annotations: {}
      labels: {}
      containerPort: 8514
      protocol: TCP
      service:
          type: ClusterIP
          clusterIP: null
          port: 8514
          externalIPs: []
          nodePort: null
          loadBalancerIP: null
          loadBalancerSourceRanges: []
          externalTrafficPolicy: null

#########Loki Values #########
##Default value file
loki:
  global:
    image:
      registry: registry.xyz.com
    fullnameOverride: loki
    imagePullSecrets: [dacsecret]
  test:
    enabled: false
  gateway:
    enabled: true
    image:
      registry: registry.xyz.com
      repository: public/nginxinc/nginx-unprivileged
      tag: 1.24-alpine
  lokiCanary:
    enabled: false
  chunksCache:
    enabled: false
  resultsCache:
    enabled: false

# SimpleScalable Mode related values
  deploymentMode: SimpleScalable
  sidecar:
    image:
      repository: registry.xyz.com/public/kiwigrid/k8s-sidecar
      tag: 1.24.3
  memcached:
    image:
      repository: registry.xyz.com/public/memcached
#default#      tag: 1.6.23-alpine 
      tag: 1.6.25
  memcachedExporter:
    image: 
      repository: registry.xyz.com/public/prom/memcached-exporter
      tag: v0.14.2
  minio:
    enabled: true
    image:
      repository: registry.xyz.com/public/minio
    mcImage:
      repository: registry.xyz.com/public/quay.io/minio/mc
    persistence:
      enabled: true
      storageClass: "rook-block"
      size: 20Gi
  backend:
    replicas: 2
    persistence:
      volumeClaimsEnabled: true
      size: 10Gi
      storageClass: "rook-block"
      enableStatefulSetAutoDeletePVC: false
  read:
    replicas: 2
    persistence:
      volumeClaimsEnabled: true
      storageClass: "rook-block"
      enableStatefulSetAutoDeletePVC: false
  write:
    replicas: 2
    persistence:
      volumeClaimsEnabled: true
      storageClass: "rook-block"
    enableStatefulSetAutoDeletePVC: false
  singleBinary:
    replicas: 0
  ingester:
    replicas: 0
  querier:
    replicas: 0
  queryFrontend:
    replicas: 0
  queryScheduler:
    replicas: 0
  distributor:
    replicas: 0
  compactor:
    replicas: 0
  indexGateway:
    replicas: 0
  bloomCompactor:
    replicas: 0
  bloomGateway:
    replicas: 0
  tableManager:
    enabled: false

#    extraArgs:
#      - -config.expand-env=true
#    extraEnv:
#      - name: GRAFANA-LOKI-S3-ENDPOINT
#        valueFrom:
#          secretKeyRef:
#            name: loki-credentials
#            key: s3-endpoint
#      - name: GRAFANA-LOKI-S3-ACCESSKEY
#        valueFrom:
#          secretKeyRef:
#            name: loki-credentials
#            key: s3-access-key
#      - name: GRAFANA-LOKI-S3-SECRETKEY
#        valueFrom:
#          secretKeyRef:
#            name: loki-credentials
#            key: s3-secret-key
#    extraEnvFrom:
#      - secretRef:
#          name: loki-s3-secret

  loki:
    image:
      registry: registry.dac.nokia.com
      repository: public/grafana/loki
    schemaConfig:
      configs:
        - from: 2024-04-01
          store: tsdb
          object_store: s3
          schema: v13
          index:
            prefix: loki_index_
            period: 24h
    ingester:
      chunk_encoding: snappy
    tracing:
      enabled: true
    querier:
      # Default is 4, if you have enough memory and CPU you can increase, reduce if OOMing
      max_concurrent: 4
    minio:
      enabled: true
    auth_enabled: false