grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.51k stars 3.4k forks source link

Loki High CPU Usage #3020

Closed cf-sewe closed 3 years ago

cf-sewe commented 3 years ago

Describe the bug After some time, loki is running on 100% CPU.

To Reproduce Steps to reproduce the behavior:

Expected behavior

Environment:

Screenshots, Promtail config, or terminal output

cyriltovena commented 3 years ago

Nothing shocking in the trace, you have someone doing a metric query, not sure why you would expect loki to not use CPU. Do you know what the query looks like ?

cyriltovena commented 3 years ago

It should be noted that in the same second you receive a push and you are querying.

LucaDev commented 3 years ago

I'm having the same issues. image After a restart the cpu usage goes back down image

cyriltovena commented 3 years ago

Could I get a cpu profile instead of a trace please?

Le sam. 5 déc. 2020 à 14:31, Luca Kröger notifications@github.com a écrit :

I'm having the same issues. After a restart the cpu usage goes back down [image: image] https://user-images.githubusercontent.com/5812715/101244344-444a8d00-3706-11eb-9c76-46441371efa8.png

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/grafana/loki/issues/3020#issuecomment-739251481, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIBF3JJUJOEWFGCC5LNPK3STIY2FANCNFSM4UKKW5KA .

cf-sewe commented 3 years ago

Actually the initial profile was already a CPU profile - just forgot to upload it ;) I retested now with 20s window, all other pprof options are default.

[ec2-user@ip-192-168-100-136 ~]$ go tool pprof http://localhost:3100/debug/pprof/profile?seconds=20
Fetching profile over HTTP from http://localhost:3100/debug/pprof/profile?seconds=20
Saved profile in /home/ec2-user/pprof/pprof.loki.samples.cpu.002.pb.gz
File: loki
Type: cpu
Time: Dec 16, 2020 at 6:29am (UTC)
Duration: 20.12s, Total samples = 15.51s (77.10%)

pprof.loki.samples.cpu.002.pb.gz

LucaDev commented 3 years ago

Thank you @cf-sewe, @cyriltovena I wasn't able to reproduce the bug again until now. It happened two times so far.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

LucaDev commented 3 years ago

@cf-sewe has the error occurred again recently?

cf-sewe commented 3 years ago

Yes that CPU hog occurs reliably after some time in my environment - however only on old Loki version 2.0.0. Have not updated Loki to v2.1 yet, and won't soon.

xal3xhx commented 3 years ago

issue is occuring for me as well, running a fresh install of everything. will be back tomorrow with more info and what my current setup is

LucaDev commented 3 years ago

This should be reopened.

xal3xhx commented 3 years ago

I agree this needs to be reopened, I’m not home so I can’t test anything but I’m running the most up to date version of truenas and I have grafana inside a jail.

Loki is running on the same jail and is also the most up to date version with default configs.

promtail is running inside another jail on the same system, that jail is running nginx and promtail is looking at the logs with a standard config with 2 replace’s

the dashboard on grafana is the Loki V2.0 nginx dashboard

after some time the cpu on the system gets pinned to 100% I believe it was around 1-2 hours but I could be completely wrong there.

Stoping promtail has no effect proving it’s a Loki issue, restarting Loki fixes the problem for a little while.

as I final note both promtail and Loki are started with the system through an rc.d file using the Daemon command

will post configs and other reports when I get have later today

xal3xhx commented 3 years ago

After looking through the other issues posted it looks like this might be related to issue #3275

cyriltovena commented 3 years ago

https://github.com/grafana/loki/issues/3275#issuecomment-780373814