grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.7k stars 3.42k forks source link

ridiculous memory usage #9847

Open uhthomas opened 1 year ago

uhthomas commented 1 year ago

Describe the bug

Loki memory usage increases hugely over a period of time.

image

To Reproduce

  1. Run loki normally.

Expected behavior

Loki should not use 86Gi of memory.

Environment:

v2.8.2

https://github.com/uhthomas/automata/tree/05fc4a222452ced063f21f8cc6a8c7b6d55bfd02/k8s/unwind/loki

Screenshots, Promtail config, or terminal output


loki-write-0 loki level=warn ts=2023-07-03T10:35:48.396262717Z caller=logging.go:86 traceID=0b22c163b6825421 orgID=fake msg="POST /loki/api/v1/push (500) 5.011475663s Response: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\\n\" ws: false; Connection: close; Content-Length: 193244; Content-Type: application/x-protobuf; User-Agent: GrafanaAgent/; "
loki-write-0 loki level=warn ts=2023-07-03T10:35:49.380469302Z caller=logging.go:86 traceID=4408baad06f6d7af orgID=fake msg="POST /loki/api/v1/push (500) 5.007299347s Response: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\\n\" ws: false; Connection: close; Content-Length: 179066; Content-Type: application/x-protobuf; User-Agent: GrafanaAgent/; "
loki-write-0 loki level=warn ts=2023-07-03T10:36:07.846722928Z caller=logging.go:86 traceID=7b90e68b9e86f5b9 orgID=fake msg="POST /loki/api/v1/push (500) 5.017661422s Response: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\\n\" ws: false; Connection: close; Content-Length: 198825; Content-Type: application/x-protobuf; User-Agent: GrafanaAgent/; "
loki-write-0 loki level=warn ts=2023-07-03T10:36:25.962289431Z caller=logging.go:86 traceID=5d68c285aaf88bca orgID=fake msg="POST /loki/api/v1/push (500) 5.001675562s Response: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\\n\" ws: false; Connection: close; Content-Length: 1782; Content-Type: application/x-protobuf; User-Agent: GrafanaAgent/; "
monoxane commented 1 year ago

Also seeing this, when the s3 storage is unavailable for whatever reason memory usage goes through the roof, I had 3 writers each using 10 cpu cores and over 130gb of ram image I understand the problem with potential missed logs but in this situation it should absolutely cap at something like 10gb (make it a user setting somewhere)