Loki is running, but connection refused and http 404 page not found.

dewstyh commented 9 months ago

Describe the bug Loki pod is running, but cannot access loki service from Http://loki:3100 or even at http://localhost:3100. Promtail says "level=warn ts=2024-02-07T20:22:28.048083032Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=-1 tenant= error="Post \"http://loki:3100/loki/api/v1/push\": dial tcp 172.20.201.59:3100: connect: connection refused" Grafana says when testing data source: "Unable to connect with Loki. Please check the server logs for more details."

To Reproduce Steps to reproduce the behavior:

Installed loki-stack helm chart version "2.10.1", which installs grafana/loki: v2.6.1 and promtail- v2.9.3
installed through terraform, extra config settings to store logs in s3 bucket which is succesfully happening time to time.
linked to kube-prom-stack grafana chart version - 56.2.2 as a loki data source.

Expected behavior everything creates like service accounts for loki-promtail and loki pod is also runnig and sending logs to s3 bucket using compactor. but loki service is not able access says "connection refused". if you exec into grafana pod and say "curl http://loki:3100" says 404 error not found.

Environment:

Infrastructure: aws eks
Deployment tool: helm

Screenshots, Promtail config, or terminal output thagoni@iB1033 MINGW64 ~/Desktop/repositories/BaseInfrastructure (RII/SPIKE/DEVOPS-161/ServiceMeshSetUp) $ kubectl describe pod loki-0 -n monitoring Name: loki-0 Namespace: monitoring Priority: 0 Service Account: loki Node: ip-10-60-2-49.eu-west-1.compute.internal/10.60.2.49 Start Time: Wed, 07 Feb 2024 15:22:16 -0500 Labels: app=loki apps.kubernetes.io/pod-index=0 controller-revision-hash=loki-7586d5599f name=loki release=loki statefulset.kubernetes.io/pod-name=loki-0 Annotations: checksum/config: 2a120e4d0d88f524b9589fe1d544395d6be51a0a02e568da2a4c6f766cd20173 prometheus.io/port: http-metrics prometheus.io/scrape: true Status: Running IP: 10.60.2.67 IPs: IP: 10.60.2.67 Controlled By: StatefulSet/loki Containers: loki: Container ID: containerd://ecca141212172e1d62f477db666cc9abd6229240bfa63416ec602daf69caf450 Image: grafana/loki:2.6.1 Image ID: docker.io/grafana/loki@sha256:1ee60f980950b00e505bd564b40f720132a0653b110e993043bb5940673d060a Ports: 3100/TCP, 9095/TCP, 7946/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP Args: -config.file=/etc/loki/loki.yaml State: Running Started: Wed, 07 Feb 2024 15:22:17 -0500 Ready: True Restart Count: 0 Liveness: http-get http://:http-metrics/ready delay=45s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get http://:http-metrics/ready delay=45s timeout=1s period=10s #success=1 #failure=3 Environment: AWS_STS_REGIONAL_ENDPOINTS: regional AWS_DEFAULT_REGION: eu-west-1 AWS_REGION: eu-west-1 AWS_ROLE_ARN: arn:aws:iam::207997242047:role/loki-eks-role-staginginfra AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token Mounts: /data from storage (rw) /etc/loki from config (rw) /tmp from tmp (rw) /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vd5kx (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: aws-iam-token: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 86400 tmp: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: config: Type: Secret (a volume populated by a Secret) SecretName: loki Optional: false storage: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: kube-api-access-vd5kx: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message

Normal Scheduled 17m default-scheduler Successfully assigned monitoring/loki-0 to ip-10-60-2-49.eu-west-1.compute.internal Normal Pulled 17m kubelet Container image "grafana/loki:2.6.1" already present on machine Normal Created 17m kubelet Created container loki Normal Started 17m kubelet Started container loki Warning Unhealthy 16m (x2 over 17m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 503 Warning Unhealthy 16m (x2 over 17m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503

kubectl logs output

kubectl logs loki-0 -n monitoring level=info ts=2024-02-07T20:22:17.108922813Z caller=main.go:103 msg="Starting Loki" version="(version=2.6.1, branch=HEAD, revision=6bd05c9a4)" level=info ts=2024-02-07T20:22:17.109036425Z caller=modules.go:736 msg="RulerStorage is not configured in single binary mode and will not be started." level=info ts=2024-02-07T20:22:17.110074064Z caller=server.go:288 http=[::]:3100 grpc=[::]:9095 msg="server listening on addresses" level=warn ts=2024-02-07T20:22:17.118120348Z caller=experimental.go:20 msg="experimental feature in use" feature="In-memory (FIFO) cache - chunksfifocache" level=info ts=2024-02-07T20:22:17.1197379Z caller=table_manager.go:252 msg="query readiness setup completed" duration=2.122µs distinct_users_len=0 level=info ts=2024-02-07T20:22:17.119780323Z caller=shipper.go:124 msg="starting index shipper in RW mode" level=info ts=2024-02-07T20:22:17.120352007Z caller=shipper_index_client.go:79 msg="starting boltdb shipper in RW mode" level=info ts=2024-02-07T20:22:17.121468091Z caller=table_manager.go:134 msg="uploading tables" level=info ts=2024-02-07T20:22:17.121690467Z caller=table_manager.go:167 msg="handing over indexes to shipper" level=info ts=2024-02-07T20:22:17.123019038Z caller=modules.go:761 msg="RulerStorage is nil. Not starting the ruler." level=info ts=2024-02-07T20:22:17.125720801Z caller=worker.go:112 msg="Starting querier worker using query-scheduler and scheduler ring for addresses" level=info ts=2024-02-07T20:22:17.128389809Z caller=module_service.go:82 msg=initialising module=server level=info ts=2024-02-07T20:22:17.128550405Z caller=module_service.go:82 msg=initialising module=query-frontend-tripperware level=info ts=2024-02-07T20:22:17.128576855Z caller=module_service.go:82 msg=initialising module=memberlist-kv level=info ts=2024-02-07T20:22:17.128630839Z caller=module_service.go:82 msg=initialising module=store level=info ts=2024-02-07T20:22:17.128664639Z caller=module_service.go:82 msg=initialising module=ring level=info ts=2024-02-07T20:22:17.12876455Z caller=ring.go:263 msg="ring doesn't exist in KV store yet" level=info ts=2024-02-07T20:22:17.128843104Z caller=module_service.go:82 msg=initialising module=usage-report level=info ts=2024-02-07T20:22:17.129074246Z caller=module_service.go:82 msg=initialising module=distributor level=info ts=2024-02-07T20:22:17.129146378Z caller=module_service.go:82 msg=initialising module=compactor level=info ts=2024-02-07T20:22:17.129200132Z caller=ring.go:263 msg="ring doesn't exist in KV store yet" level=info ts=2024-02-07T20:22:17.129238832Z caller=module_service.go:82 msg=initialising module=ingester-querier level=info ts=2024-02-07T20:22:17.129259942Z caller=module_service.go:82 msg=initialising module=ingester level=info ts=2024-02-07T20:22:17.129294458Z caller=ingester.go:401 msg="recovering from checkpoint" level=info ts=2024-02-07T20:22:17.129397012Z caller=recovery.go:39 msg="no checkpoint found, treating as no-op" level=info ts=2024-02-07T20:22:17.129446276Z caller=module_service.go:82 msg=initialising module=query-scheduler level=info ts=2024-02-07T20:22:17.129505207Z caller=ring.go:263 msg="ring doesn't exist in KV store yet" level=info ts=2024-02-07T20:22:17.129606292Z caller=lifecycler.go:547 msg="not loading tokens from file, tokens file path is empty" level=info ts=2024-02-07T20:22:17.129636385Z caller=lifecycler.go:576 msg="instance not found in ring, adding with no tokens" ring=distributor level=info ts=2024-02-07T20:22:17.129795193Z caller=lifecycler.go:416 msg="auto-joining cluster after timeout" ring=distributor level=info ts=2024-02-07T20:22:17.129910904Z caller=basic_lifecycler.go:261 msg="instance not found in the ring" instance=loki-0 ring=compactor level=info ts=2024-02-07T20:22:17.129948644Z caller=basic_lifecycler_delegates.go:63 msg="not loading tokens from file, tokens file path is empty" level=info ts=2024-02-07T20:22:17.130091377Z caller=basic_lifecycler.go:261 msg="instance not found in the ring" instance=loki-0 ring=scheduler level=info ts=2024-02-07T20:22:17.130112831Z caller=basic_lifecycler_delegates.go:63 msg="not loading tokens from file, tokens file path is empty" level=info ts=2024-02-07T20:22:17.13022309Z caller=compactor.go:307 msg="waiting until compactor is JOINING in the ring" level=info ts=2024-02-07T20:22:17.130242532Z caller=compactor.go:311 msg="compactor is JOINING in the ring" level=info ts=2024-02-07T20:22:17.130278111Z caller=ingester.go:417 msg="recovered WAL checkpoint recovery finished" elapsed=1.004053ms errors=false level=info ts=2024-02-07T20:22:17.130296029Z caller=ingester.go:423 msg="recovering from WAL" level=info ts=2024-02-07T20:22:17.131879088Z caller=scheduler.go:617 msg="waiting until scheduler is JOINING in the ring" level=info ts=2024-02-07T20:22:17.132145104Z caller=scheduler.go:621 msg="scheduler is JOINING in the ring" level=info ts=2024-02-07T20:22:17.132682363Z caller=ingester.go:439 msg="WAL segment recovery finished" elapsed=3.407953ms errors=false level=info ts=2024-02-07T20:22:17.132711332Z caller=ingester.go:387 msg="closing recoverer" level=info ts=2024-02-07T20:22:17.132735725Z caller=ingester.go:395 msg="WAL recovery finished" time=3.450095ms level=info ts=2024-02-07T20:22:17.133166304Z caller=lifecycler.go:547 msg="not loading tokens from file, tokens file path is empty" level=info ts=2024-02-07T20:22:17.133207209Z caller=lifecycler.go:576 msg="instance not found in ring, adding with no tokens" ring=ingester level=info ts=2024-02-07T20:22:17.133301923Z caller=lifecycler.go:416 msg="auto-joining cluster after timeout" ring=ingester level=info ts=2024-02-07T20:22:17.133420954Z caller=wal.go:156 msg=started component=wal ts=2024-02-07T20:22:17.13461987Z caller=memberlist_logger.go:74 level=warn msg="Failed to resolve loki-memberlist: lookup loki-memberlist on 172.20.0.10:53: no such host" level=info ts=2024-02-07T20:22:18.131099981Z caller=compactor.go:321 msg="waiting until compactor is ACTIVE in the ring" level=info ts=2024-02-07T20:22:18.131146045Z caller=compactor.go:325 msg="compactor is ACTIVE in the ring" level=info ts=2024-02-07T20:22:18.13349622Z caller=scheduler.go:631 msg="waiting until scheduler is ACTIVE in the ring" level=info ts=2024-02-07T20:22:18.133560786Z caller=scheduler.go:635 msg="scheduler is ACTIVE in the ring" level=info ts=2024-02-07T20:22:18.133645279Z caller=module_service.go:82 msg=initialising module=querier level=info ts=2024-02-07T20:22:18.133820767Z caller=module_service.go:82 msg=initialising module=query-frontend level=info ts=2024-02-07T20:22:18.134015563Z caller=loki.go:374 msg="Loki started" level=info ts=2024-02-07T20:22:18.342694497Z caller=memberlist_client.go:563 msg="joined memberlist cluster" reached_nodes=1 level=info ts=2024-02-07T20:22:21.134226057Z caller=scheduler.go:682 msg="this scheduler is in the ReplicationSet, will now accept requests." level=info ts=2024-02-07T20:22:21.134251409Z caller=worker.go:209 msg="adding connection" addr=10.60.2.67:9095 level=info ts=2024-02-07T20:22:23.13125544Z caller=compactor.go:386 msg="this instance has been chosen to run the compactor, starting compactor" level=info ts=2024-02-07T20:22:23.131429983Z caller=compactor.go:413 msg="waiting 10m0s for ring to stay stable and previous compactions to finish before starting compactor" level=info ts=2024-02-07T20:22:28.135239576Z caller=frontend_scheduler_worker.go:101 msg="adding connection to scheduler" addr=10.60.2.67:9095 level=info ts=2024-02-07T20:23:17.121661169Z caller=table_manager.go:134 msg="uploading tables" level=info ts=2024-02-07T20:23:17.122755904Z caller=table_manager.go:167 msg="handing over indexes to shipper" level=info ts=2024-02-07T20:24:17.121947262Z caller=table_manager.go:134 msg="uploading tables" level=info ts=2024-02-07T20:24:17.121939425Z caller=table_manager.go:167 msg="handing over indexes to shipper"

kubectl promtail logs output:

level=info ts=2024-02-07T20:22:27.84838984Z caller=tailer.go:143 component=tailer msg="tail routine: started" path=/var/log/pods/linkerd-jaeger_jaeger-595975bfcd-qbzkv_162056cc-eb63-4d65-8fb0-064946b63163/jaeger/0.log ts=2024-02-07T20:22:27.848440664Z caller=log.go:168 level=info msg="Seeked /var/log/pods/linkerd-jaeger_jaeger-595975bfcd-qbzkv_162056cc-eb63-4d65-8fb0-064946b63163/linkerd-network-validator/0.log - &{Offset:1364 Whence:0}" level=info ts=2024-02-07T20:22:27.848477263Z caller=tailer.go:143 component=tailer msg="tail routine: started" path=/var/log/pods/linkerd-jaeger_jaeger-595975bfcd-qbzkv_162056cc-eb63-4d65-8fb0-064946b63163/linkerd-network-validator/0.log ts=2024-02-07T20:22:27.84851703Z caller=log.go:168 level=info msg="Seeked /var/log/pods/kube-system_secrets-provider-aws-secrets-store-csi-driver-provider-awslt87x_d83584d3-f50a-4264-999c-026c4ba9ca86/provider-aws-installer/0.log - &{Offset:2337 Whence:0}" level=info ts=2024-02-07T20:22:27.848541876Z caller=tailer.go:143 component=tailer msg="tail routine: started" path=/var/log/pods/kube-system_secrets-provider-aws-secrets-store-csi-driver-provider-awslt87x_d83584d3-f50a-4264-999c-026c4ba9ca86/provider-aws-installer/0.log ts=2024-02-07T20:22:27.848662227Z caller=log.go:168 level=info msg="Seeked /var/log/pods/cert-manager_cert-manager-webhook-58fd67545d-nqqrs_6cd8e2d3-8670-4c53-8d24-22d4c03ceaa0/cert-manager-webhook/0.log - &{Offset:5473 Whence:0}" ts=2024-02-07T20:22:28.033669721Z caller=log.go:168 level=info msg="Re-opening truncated file /var/log/pods/staging-ibwave_assets-service-5d649b65d6-rnbkp_5e44eb18-00f1-4c0d-9eaf-38f4dacbdc89/assets-service/0.log ..." ts=2024-02-07T20:22:28.033925306Z caller=log.go:168 level=info msg="Successfully reopened truncated /var/log/pods/staging-ibwave_assets-service-5d649b65d6-rnbkp_5e44eb18-00f1-4c0d-9eaf-38f4dacbdc89/assets-service/0.log" level=warn ts=2024-02-07T20:22:28.048083032Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=-1 tenant= error="Post \"http://loki:3100/loki/api/v1/push\": dial tcp 172.20.201.59:3100: connect: connection refused" level=warn ts=2024-02-07T20:22:28.955735573Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=-1 tenant= error="Post \"http://loki:3100/loki/api/v1/push\": dial tcp 172.20.201.59:3100: connect: connection refused" level=warn ts=2024-02-07T20:22:30.019724012Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=-1 tenant= error="Post \"http://loki:3100/loki/api/v1/push\": dial tcp 172.20.201.59:3100: connect: connection refused" level=warn ts=2024-02-07T20:22:32.780896289Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=-1 tenant= error="Post \"http://loki:3100/loki/api/v1/push\": dial tcp 172.20.201.59:3100: connect: connection refused" level=warn ts=2024-02-07T20:22:40.543119547Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=-1 tenant= error="Post \"http://loki:3100/loki/api/v1/push\": dial tcp 172.20.201.59:3100: connect: connection refused" level=warn ts=2024-02-07T20:22:53.307863125Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=-1 tenant= error="Post \"http://loki:3100/loki/api/v1/push\": dial tcp 172.20.201.59:3100: connect: connection refused" level=warn ts=2024-02-07T20:23:18.268755422Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=-1 tenant= error="Post \"http://loki:3100/loki/api/v1/push\": dial tcp 172.20.201.59:3100: connect: connection refused"

Grafana error for loki logs:

2024-02-07 15:42:51.216

logger=tsdb.loki endpoint=queryData pluginId=loki dsName=Loki dsUID=P8E80F9AEF21F6940 uname=admin fromAlert=false t=2024-02-07T20:42:51.215976098Z level=error msg="Error querying loki" error="Get \"http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/query_range?direction=backward&end=1707338579033000000&limit=10&query=%7Bnode_name%3D%22ip-10-60-2-49.eu-west-1.compute.internal%22%7D+%7C%3D+%60%60&start=1707316979033000000&step=21600000ms\": context canceled"

and says "Failed to load log volume for this query parse error at line 1, col 101: syntax error: unexpected IDENTIFIER"

please help me it doesn't make sense why connection is not getting.

withinboredom commented 9 months ago

There's a typo in the chart and it installed loki 2.6.3 instead of 2.9.3. Set the following value in your helm chart (or use the correct image):

loki:
  image:
    tag: 2.9.3

dewstyh commented 9 months ago

i updated the image tag as you mentioned. still same problem pod is running but service says 404 page not found and pods also fails liveness and readiness tests with 503 errors in loki pod events.

withinboredom commented 9 months ago

It's possible I got the helm "address" or "path" wrong, so check that the pod running is the correct tag.

FranciscoCross commented 8 months ago

I am facing the same issue; I have already tried various versions of both Loki Chart and Promtail. This problem persists for me as well. In my case, I am using EKS 1.28. I applied this configuration to other clusters with different Kubernetes versions, and it worked. This leads me to believe that it might be an issue with AWS or some add-ons.

hvspa commented 8 months ago

hello,

i am also facing the same issue, as suggested i have changed the image directly in the pod to 2.9.3, pod got restarted but still the readiness and liveness probes are failing:

kubectl describe pod loki-0 | tail
  Normal   Pulled     2m9s                kubelet            Container image "grafana/loki:2.6.1" already present on machine
  Normal   Killing    63s                 kubelet            Container loki definition changed, will be restarted
  Normal   Pulling    61s                 kubelet            Pulling image "grafana/loki:2.9.3"
  Warning  Unhealthy  59s                 kubelet            Liveness probe failed: Get "http://10.124.1.121:3100/ready": dial tcp 10.124.1.121:3100: connect: connection refused
  Warning  Unhealthy  59s                 kubelet            Readiness probe failed: Get "http://10.124.1.121:3100/ready": dial tcp 10.124.1.121:3100: connect: connection refused
  Normal   Pulled     57s                 kubelet            Successfully pulled image "grafana/loki:2.9.3" in 4.605171397s (4.605252074s including waiting)
  Normal   Created    56s (x2 over 2m9s)  kubelet            Created container loki
  Normal   Started    56s (x2 over 2m8s)  kubelet            Started container loki
  Warning  Unhealthy  9s (x3 over 79s)    kubelet            Liveness probe failed: HTTP probe failed with statuscode: 503
  Warning  Unhealthy  9s (x3 over 79s)    kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503

anyone got this sorted out?

tks

zensqlmonitor commented 8 months ago

Same issue for me Warning Unhealthy 20m (x2 over 20m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503 Warning Unhealthy 20m (x2 over 20m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 503

hvspa commented 8 months ago

Same issue for me Warning Unhealthy 20m (x2 over 20m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503 Warning Unhealthy 20m (x2 over 20m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 503

try this

helm upgrade --install loki grafana/loki-stack --set resources.requests.cpu=100m --set resources.requests.memory=128Mi -f gptprom.yaml

cat gptprom.yaml 
promtail:
  enabled: true
  config:
    clients:
    - url: http://{{ .Release.Name }}:3100/loki/api/v1/push
    logLevel: info
    serverPort: 3101
    snippets:
      pipelineStages:
      - cri: {}
      - match:
          selector: '{app="ingress-nginx", job="default/ingress-nginx"}'
          stages:
          - regex:
              expression: '^(?P<remote_addr>[\w\.]+) - (?P<remote_user>[^ ]*) \[(?P<time_local>.*)\] "(?P<method>[^ ]*) (?P<request>[^ ]*) (?P<protocol>[^ ]*)" (?P<status>[\d]+) (?P<body_bytes_sent>[\d]+) "(?P<http_referer>[^"]*)" "(?P<http_user_agent>[^"]*)" (?P<request_length>[\d]+) (?P<request_time>[^ ]+) \[(?P<proxy_upstream_name>.*)\] \[(?P<proxy_alternative_upstream_name>.*)\] (?P<upstream_addr>[\w\.]+:\d{1,5}) (?P<upstream_response_length>[\d]+) (?P<upstream_response_time>\d+(\.\d+)?) (?P<upstream_status>[\d]+)?'

              #log-format-upstream: '$remote_addr - $remote_user [$time_local] $request $status $body_bytes_sent $http_referer $http_user_agent $request_length $request_time [$proxy_upstream_name] [$proxy_alternative_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id'
      - labels:
          remote_addr:
          remote_user:
          time_local:
          method:
          request:
          protocol:
          status:
          body_bytes_sent:
          http_referer:
          http_user_agent:
          request_length:
          request_time:
          proxy_upstream_name:
          proxy_alternative_upstream_name:
          upstream_addr:
          upstream_response_length:
          upstream_response_time:
          upstream_status:

i have added extra pipelinstages to parse nginx controller logs, you can skip it or use it. however later on migrated to json in nginx.conf to avoid the extra parsing

     pipelineStages:
      - cri: {}
      - match:
          selector: '{app="ingress-nginx", job="default/ingress-nginx"}'
          stages:
          - regex:
              expression: '^(?P<remote_addr>[\w\.]+) - (?P<remote_user>[^ ]*) \[(?P<time_local>.*)\] "(?P<method>[^ ]*) (?P<request>[^ ]*) (?P<protocol>[^ ]*)" (?P<status>[\d]+) (?P<body_bytes_sent>[\d]+) "(?P<http_referer>[^"]*)" "(?P<http_user_agent>[^"]*)" (?P<request_length>[\d]+) (?P<request_time>[^ ]+) \[(?P<proxy_upstream_name>.*)\] \[(?P<proxy_alternative_upstream_name>.*)\] (?P<upstream_addr>[\w\.]+:\d{1,5}) (?P<upstream_response_length>[\d]+) (?P<upstream_response_time>\d+(\.\d+)?) (?P<upstream_status>[\d]+)?'

kubectl describe pod loki-0 | tail
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

kubectl get pods  | grep loki
loki-0                                                            1/1     Running   0          19h
loki-promtail-4ft2c                                               1/1     Running   0          19h
loki-promtail-pb5lz                                               1/1     Running   0          19h

zensqlmonitor commented 8 months ago

Not better

hvspa commented 8 months ago

if you describe your loki pod, what is the version of image?

try getting loki values into file from helm chart by this command:

tar --extract --file=/root/.cache/helm/repository/loki-stack-2.10.2.tgz loki-stack/charts/loki/values.yaml -O > loki_values.yaml

then in the loki_values.yaml file modify the image to 2.9.3:

cat loki_values.yaml |  grep -i "image:" -A2 -m1
image:
  repository: grafana/loki
  tag: 2.9.3

then upgrade your loki install:

helm upgrade --install loki grafana/loki-stack --set resources.requests.cpu=100m --set resources.requests.memory=256Mi -f loki_values.yaml

see if this makes it work

ps. just to make sure your pods are getting the new config restart the pods:

kubectl rollout restart sts loki

gawbul commented 8 months ago

Setting image repository and tag in the values.yaml worked for me.

gawbul commented 8 months ago

I think the correct fix for this, however, is to update the Loki subchart in the Loki Stack chart to the latest version. It is still version 2.6.1 and should be using the latest release.

MartinLoeper commented 5 months ago

@gawbul Agree! Tried to bump loki subchart to latest version in a fork repo and ran into several more issues unfortunately :(

e.g. https://github.com/grafana/loki/issues/12773

vasanthreddy12 commented 5 months ago

Hi we deployed loki using loki operator provided by redhat openshift .In that version of loki is 3.0.0. We are using grafana v11. We are also getting the same error unable to connect to loki .But it is working fine fro grafana v8.5 version .What should we have to do for connecting loki with grafana

Do we have to downgrade loki to 2.9 version as suggested in above discussion?

ipefixledruide commented 4 months ago

There's a typo in the chart and it installed loki 2.6.3 instead of 2.9.3. Set the following value in your helm chart (or use the correct image):
loki:
  image:
    tag: 2.9.3

Worked well for me, thank you :)

Filipcsupka commented 4 months ago

i updated the image tag as you mentioned. still same problem pod is running but service says 404 page not found and pods also fails liveness and readiness tests with 503 errors in loki pod events.

i am strugling with the same thing, even worse is that same setup works on all other clusters... It even worked on current one but it stopped one day. Not even updating the specific tag as they say here solves or changes a thing...

Hamdy commented 3 months ago

this works for me

helm repo add grafana https://grafana.github.io/helm-charts

helm repo update
kubectl create namespace monitoring
helm install prometheus prometheus-community/prometheus --namespace monitoring
helm upgrade -install grafana -n monitoring bitnami/grafana --namespace monitoring
helm upgrade -install loki -n monitoring bitnami/grafana-loki --namespace monitoring

# For grafana use `admin` as username and for password

kubectl get secret --namespace monitoring grafana-admin  -o jsonpath="{.data.GF_SECURITY_ADMIN_PASSWORD}" | base64 --decode

# Port forward grafana
kubectl port-forward services/grafana --namespace monitoring 3000:3000

- Add data source prometheus http://prometheus-server
- Add datasource loki http://loki-grafana-loki-gateway

EliaMaggioniSGS commented 2 months ago

For whoever is having this issue in grafana:11.1.5 and loki:3.1.1 msg="Error received from Loki" error="Get \"http://loki.logs.svc.cluster.local:3100/loki/api/v1/query?direction=backward&query=vector%281%29%2Bvector%281%29&time=4000000000\": net/http: invalid header field name \"\"" adding the header in the datasource Accept: */* solved my issue.

csyyy106 commented 1 month ago

图表中有一个拼写错误，它安装了 loki 2.6.3 而不是 2.9.3。在 helm chart 中设置以下值（或使用正确的图像）：

loki: image: tag: 2.9.3

There's a typo in the chart and it installed loki 2.6.3 instead of 2.9.3. Set the following value in your helm chart (or use the correct image):

loki: image: tag: 2.9.3

This doesn't seem to work for me. I'm using Loki-stack 2.10.2

grafana / loki

Loki is running, but connection refused and http 404 page not found. #11893