grafana / helm-charts

Apache License 2.0
1.67k stars 2.29k forks source link

prometheus Loki read: connection reset by peer #2607

Open githubkannadhasan opened 1 year ago

githubkannadhasan commented 1 year ago

Installed prometheus with this helm chart #helm install prometheus prometheus-community/prometheus Installed loki with this helm chart: helm install loki -n $monitoring_namespace grafana/loki-stack

prometheus always show 2 loki down alerts attached as screenshot:

down alert: 1 Get "http://172.18.2.162:7946/metrics": read tcp 172.18.4.12:47132->172.18.2.162:7946: read: connection reset by peer

down alert: 2

Get "http://172.18.2.162:9095/metrics": net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x00\x00\f\x04\x00\x00\x00\x00\x00\x00\x05\x00\x00@\x00\x00\x03\x00\x00\x00d" --

we could see all the pods are up in the monitoring namespace

NAME READY STATUS RESTARTS AGE grafana-9d45455d7-hnn55 3/3 Running 0 4h23m grafana-image-renderer-6b5c5bdfd9-6g5zl 1/1 Running 0 4h23m loki-0 1/1 Running 0 41m loki-promtail-5h4sb 1/1 Running 0 45m loki-promtail-8g95n 1/1 Running 0 45m loki-promtail-jp2zp 1/1 Running 0 45m loki-promtail-t9mpl 1/1 Running 0 45m loki-promtail-z9pzx 1/1 Running 0 45m prometheus-alertmanager-0 1/1 Running 0 4h23m prometheus-kube-state-metrics-5787d595ff-mbgw2 1/1 Running 0 4h23m prometheus-prometheus-node-exporter-flcp7 1/1 Running 0 4h23m prometheus-prometheus-node-exporter-flvjv 1/1 Running 0 4h23m prometheus-prometheus-node-exporter-jtn9x 1/1 Running 0 4h23m prometheus-prometheus-node-exporter-pvpjw 1/1 Running 0 4h23m prometheus-prometheus-node-exporter-zn6dm 1/1 Running 0 4h23m prometheus-prometheus-pushgateway-dfbf8b54-8qljn 1/1 Running 0 4h23m prometheus-server-68665c484b-2md94 2/2 Running 0 4h23m

image

davordbetter commented 1 year ago

Same issue here.

davordbetter commented 1 year ago

I did quick check. Prometheus scrapes scrape all pods with annotation prometheus.io/scrape. Loki has that set to true, but uses multiple ports.

            - name: http-metrics
              containerPort: {{ .Values.config.server.http_listen_port | default 3100 }}
              protocol: TCP
            - name: grpc
              containerPort: {{ .Values.config.server.grpc_listen_port | default 9095 }}
              protocol: TCP
            {{- if .Values.config.memberlist }}
            - name: memberlist-port
              containerPort: {{ .Values.config.memberlist.bind_port | default 7946 }}
              protocol: TCP
            {{- end }}

Now prometheus tries to scrape all ports, despite loki has only metrics on 3100.

At the moment, I have no idea how to solve this. I have set a "silencer" over app=loki, but this is not a good idea (just temporary), because if anything happens with loki, I won't know.

yamaoto commented 1 year ago

I encountered a similar problem. It seems that Prometheus is unable to identify the port by its name and attempts to scan all available ports.

        prometheus.io/port: http-metrics
        prometheus.io/scrape: 'true'

After I modified the annotation to use the port number, everything started working correctly.

        prometheus.io/port: '3100'
        prometheus.io/scrape: 'true'