grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
22.75k stars 3.31k forks source link

failed to flush chunks: store put chunk: RequestTimeTooSkewed: The difference between the request time and the current time is too large #10792

Open david-nano opened 9 months ago

david-nano commented 9 months ago

Describe the bug Suddenly loki-write pod is not healthy anymore, and it seems the time on the server isn't synced Hope the information with the relevant log will be enough to help.

To Reproduce Install loki using helm chart here is the values:

loki:
  global:
    image:
      registry: nexus.domain.local:8084
  loki:
    enabled: true
    auth_enabled: false
    storage:
      bucketNames:
        chunks: loki-company
        ruler: loki-company
        admin: loki-company
      type: s3
      s3:
        region: eu-west-1
    commonConfig:
      replication_factor: 1
    limits_config:
      retention_period: 48h
      retention_stream:
        - selector: '{namespace="monitoring"}'
          priority: 1
          period: 24h
        - selector: '{namespace="loki"}'
          priority: 2
          period: 24h
  ingress:
    enabled: true
    ingressClassName: "nginx"
    paths:
      write:
        - /api/prom/push
        - /loki/api/v1/push
      read:
        - /api/prom/tail
        - /loki/api/v1/tail
        - /loki/api
        - /api/prom/rules
        - /loki/api/v1/rules
        - /prometheus/api/v1/rules
        - /prometheus/api/v1/alerts
      singleBinary:
        - /api/prom/push
        - /loki/api/v1/push
        - /api/prom/tail
        - /loki/api/v1/tail
        - /loki/api
        - /api/prom/rules
        - /loki/api/v1/rules
        - /prometheus/api/v1/rules
        - /prometheus/api/v1/alerts
    hosts:
      - loki.domain.local
  write:
    replicas: 1
    persistence:
      storageClass: "netapp-storage"
    extraArgs:
      - '-config.expand-env=true'
    podAnnotations:
      vault.hashicorp.com/agent-inject: 'true'
      vault.hashicorp.com/role: 'loki'
      vault.hashicorp.com/agent-inject-secret-observability-loki-s3aws: 'internal/data/observability/loki/s3aws'
      vault.hashicorp.com/agent-inject-template-observability-loki-s3aws: |
        {{ with secret "internal/data/observability/loki/s3aws" -}}
        [default]
        aws_access_key_id={{ .Data.data.access_key_id }}
        aws_secret_access_key={{ .Data.data.secret_access_key }}
        {{- end }}
    extraEnv:
      - name: AWS_SHARED_CREDENTIALS_FILE
        value: "/vault/secrets/observability-loki-s3aws"
  read:
    replicas: 1
    persistence:
      storageClass: "netapp-storage"
    extraArgs:
      - '-config.expand-env=true'
    podAnnotations:
      vault.hashicorp.com/agent-inject: 'true'
      vault.hashicorp.com/role: 'loki'
      vault.hashicorp.com/agent-inject-secret-observability-loki-s3aws: 'internal/data/observability/loki/s3aws'
      vault.hashicorp.com/agent-inject-template-observability-loki-s3aws: |
        {{ with secret "internal/data/observability/loki/s3aws" -}}
        [default]
        aws_access_key_id={{ .Data.data.access_key_id }}
        aws_secret_access_key={{ .Data.data.secret_access_key }}
        {{- end }}
    extraEnv:
      - name: AWS_SHARED_CREDENTIALS_FILE
        value: "/vault/secrets/observability-loki-s3aws"
  backend:
    replicas: 1
    persistence:
      storageClass: "netapp-storage"
    extraArgs:
      - '-config.expand-env=true'
    podAnnotations:
      vault.hashicorp.com/agent-inject: 'true'
      vault.hashicorp.com/role: 'loki'
      vault.hashicorp.com/agent-inject-secret-observability-loki-s3aws: 'internal/data/observability/loki/s3aws'
      vault.hashicorp.com/agent-inject-template-observability-loki-s3aws: |
        {{ with secret "internal/data/observability/loki/s3aws" -}}
        [default]
        aws_access_key_id={{ .Data.data.access_key_id }}
        aws_secret_access_key={{ .Data.data.secret_access_key }}
        {{- end }}
    extraEnv:
      - name: AWS_SHARED_CREDENTIALS_FILE
        value: "/vault/secrets/observability-loki-s3aws"

promtail:
  enabled: false
  config:
    logLevel: info
    clients:
      - url: http://loki-gateway.loki.svc.cluster.local/loki/api/v1/push

Expected behavior Just working normal

Environment:

Screenshots, Promtail config, or terminal output The error log I've found:

level=error 
ts=2023-10-05T08:35:50.344368587Z 
caller=flush.go:143 
org_id=fake 
msg="failed to flush" 
err="failed to flush chunks: store put chunk: RequestTimeTooSkewed: The difference between the request time and the current time is 
too large.\n\tstatus 
  code: 403, 
  request id: SEE8RVV110B7WBJT, 
  host id: FS6RXhmp56xngSm0EKV2ZyCvXzu/cp5I8zv/ihg8u68L/2qsy2ddXW/5je768k0K0KX9y5oG27I=, 
  num_chunks: 1, 
  labels: {app=\"calico-node\", container=\"calico-node\", filename=\"/var/log/pods/kube-system_calico-node-nzfvx_7850aea6-509d-4b38-951c-b152979c857d/calico-node/0.log\", job=\"kube-system/calico-node\", namespace=\"kube-system\", node_name=\"eng-k8s-worker2\", pod=\"calico-node-nzfvx\"}"
SM-Software commented 1 month ago

We are also facing similar issue. Any resolution for this/ atleast a root cause ?

Amikuto commented 2 weeks ago

Same