Closed till closed 4 years ago
So, right around the time loki crashes:
level=warn ts=2020-01-19T10:38:52.826048866Z caller=logging.go:49 traceID=50617ce4c81f6f61 msg="POST /loki/api/v1/push (500) 170.786µs Response: \"at least 1 live ingesters required, could only find 0\\n\" ws: false; Content-Length: 748; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Scope-Orgid: till; "
And I think this ultimately causes promtail to crash:
level=error ts=2020-01-19T10:38:37.4948318Z caller=pool.go:170 msg="error removing stale clients" err="too many failed ingesters"
I see multiple of these. I adjusted my systemd units, so both come back up and continue. Is there anything I should look for? "Show context" is only semi useful.
Loki seems to not be ready, causing promtail to backoff. Not sure why you are having issue with Loki though, you should dig into that.
@cyriltovena this doesn’t happen right away, but only after a couple hours, do you have any suggestions how to dig into it?
@till there is a max_age configuration for the journal scraper to limit how far back in time it will look.
But more importantly you should probably find a directory outside of /tmp
to store your files as on most operating systems this is deleted on a restart.
The current position in the journal is saved in the positions
file which should stop it from re-reading the entire contents of the journal on each start, but if this file is getting deleted on reboot you will have to read from the begging.
As @cyriltovena mentioned though, you should check your kernel logs and your loki logs to see if you can figure out why Loki stops running as well.
@slim-bean thank you!
I'll look into max_age
, and I agree: good idea on /tmp
— would you mind a PR to your docs to change that? I mostly copy and pasted to get it off the ground.
I wouldn't know why it hits this limit. I feel like loki and promtail may be doing that to themselves. They run on the server where I am collecting journald
from.
Also, I guess this is a log stacked in a log, in a log? Judging from the error message and attempts to escape the {}
?
Jan 21 19:38:46 foo promtail[31256]: level=error ts=2020-01-21T18:38:46.32022603Z caller=client.go:231 component=client host=127.0.0.1:3100 msg="final error sending batch" status=429 error="server returned HTTP status 429 Too Many Requests (429): ingestion rate limit (4194304) exceeded while adding 4 lines"
Jan 21 19:38:46 foo promtail[31256]: level=debug ts=2020-01-21T18:38:46.328552651Z caller=pipeline.go:97 component=journal_pipeline component=pipeline msg="finished processing log line" labels="{job=\"journal\", unit=\"promtail.service\"}" time=2020-01-21T14:35:24.34171+01:00 entry="\\\\\\\\\\\\\\\\\\\\\\\\\\\" duration_s=9.14e-07" duration_s=1.077e-06
Jan 21 19:38:46 foo promtail[31256]: level=debug ts=2020-01-21T18:38:46.328649481Z caller=pipeline.go:97 component=journal_pipeline component=pipeline msg="finished processing log line" labels="{job=\"journal\", unit=\"promtail.service\"}" time=2020-01-21T14:35:24.341974+01:00 entry="level=debug ts=2020-01-21T13:35:24.333005376Z caller=pipeline.go:97 component=journal_pipeline component=pipeline msg=\"finished processing log line\" labels=\"{job=\\\"journal\\\", unit=\\\"loki.service\\\"}\" time=2020-01-21T14:35:23.841745+01:00 entry=\"level=debug ts=2020-01-21T13:35:23.840608232Z caller=grpc_logging.go:40 method=/logproto.Pusher/Push duration=68.952µs msg=\\\"gRPC (success)\\\"\" duration_s=9.65e-07" duration_s=8.67e-07
Jan 21 19:38:46 foo promtail[31256]: level=debug ts=2020-01-21T18:38:46.328733648Z caller=pipeline.go:97 component=journal_pipeline component=pipeline msg="finished processing log line" labels="{job=\"journal\", unit=\"promtail.service\"}" time=2020-01-21T14:35:24.342199+01:00 entry="level=error ts=2020-01-21T13:35:24.333314332Z caller=client.go:231 component=client host=127.0.0.1:3100 msg=\"final error sending batch\" status=429 error=\"server returned HTTP status 429 Too Many Requests (429): ingestion rate limit (4194304) exceeded while adding 3 lines\"" duration_s=7.46e-07
Jan 21 19:38:46 foo promtail[31256]: level=debug ts=2020-01-21T18:38:46.328820746Z caller=pipeline.go:97 component=journal_pipeline component=pipeline msg="finished processing log line" labels="{job=\"journal\", unit=\"promtail.service\"}" time=2020-01-21T14:35:24.34242+01:00 entry="level=debug ts=2020-01-21T13:35:24.333415616Z caller=pipeline.go:97 component=journal_pipeline component=pipeline msg=\"finished processing log line\" labels=\"{job=\\\"journal\\\", unit=\\\"loki.service\\\"}\" time=2020-01-21T14:35:23.842046+01:00 entry=\"level=debug ts=2020-01-21T13:35:23.840886512Z caller=logging.go:44 traceID=3c1c6234af9a8c4c msg=\\\"POST /loki/api/v1/push (204) 1.681578ms\\\"\" duration_s=6.86e-07" duration_s=7.5e-07
The logs look weird. 👆
There are also a ton of these (I added a few \n
here):
Jan 21 19:38:46 foo promtail[31256]: level=debug ts=2020-01-21T18:38:46.329061078Z caller=pipeline.go:97 component=journal_pipeline component=pipeline msg="finished processing log line" labels="{job=\"journal\", unit=\"promtail.service\"}" time=2020-01-21T14:35:24.342693+01:00 entry="level=debug ts=2020-01-21T13:35:24.333598233Z caller=pipeline.go:97 component=journal_pipeline component=pipeline msg=\"finished processing log line\" labels=\"{job=\\\"journal\\\", unit=\\\"promtail.service\\\"}\" time=2020-01-21T14:35:23.842427+01:00 entry=\"\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
...
So far journalctl -k
has nothing. Where else would you like me to look?
i would turn off debug logging in promtail to start, it's incredibly verbose and I would only turn it on if you are actively debugging
There must more info in Loki logs related to that unhealthy ingester, you’re getting throttled too so you can look at increasing limits
Turned off promtail
, and now this is what stands out:
Jan 21 21:48:16 foo loki[31392]: level=error ts=2020-01-21T20:48:16.032551518Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 21 21:48:16 foo loki[31392]: level=error ts=2020-01-21T20:48:16.225725369Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 21 21:48:16 foo loki[31392]: level=error ts=2020-01-21T20:48:16.59779426Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 21 21:48:17 foo loki[31392]: level=error ts=2020-01-21T20:48:17.189600192Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 21 21:48:18 foo loki[31392]: level=error ts=2020-01-21T20:48:18.013711342Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 21 21:48:20 foo loki[31392]: level=error ts=2020-01-21T20:48:20.330454261Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 21 21:48:24 foo loki[31392]: level=error ts=2020-01-21T20:48:24.228279067Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 21 21:48:27 foo loki[31392]: level=error ts=2020-01-21T20:48:27.686594299Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 21 21:48:32 foo loki[31392]: level=error ts=2020-01-21T20:48:32.19312131Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 21 21:48:36 foo loki[31392]: level=error ts=2020-01-21T20:48:36.67301656Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 21 21:48:36 foo loki[31392]: level=error ts=2020-01-21T20:48:36.673044604Z caller=lifecycler.go:659 msg="Failed to transfer chunks to another ingester" err="terminated after 10 retries"
Jan 22 01:21:39 foo loki[21569]: level=error ts=2020-01-22T00:21:39.504564159Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 22 01:21:39 foo loki[21569]: level=error ts=2020-01-22T00:21:39.689093456Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 22 01:21:40 foo loki[21569]: level=error ts=2020-01-22T00:21:40.04620463Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 22 01:21:40 foo loki[21569]: level=error ts=2020-01-22T00:21:40.739517917Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 22 01:21:41 foo loki[21569]: level=error ts=2020-01-22T00:21:41.799068605Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 22 01:21:44 foo loki[21569]: level=error ts=2020-01-22T00:21:44.662210981Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 22 01:21:49 foo loki[21569]: level=error ts=2020-01-22T00:21:49.416478304Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 22 01:21:53 foo loki[21569]: level=error ts=2020-01-22T00:21:53.84908402Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 22 01:21:57 foo loki[21569]: level=error ts=2020-01-22T00:21:57.67657362Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 22 01:22:02 foo loki[21569]: level=error ts=2020-01-22T00:22:02.171713361Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 22 01:22:02 foo loki[21569]: level=error ts=2020-01-22T00:22:02.171739512Z caller=lifecycler.go:659 msg="Failed to transfer chunks to another ingester" err="terminated after 10 retries"
It looks like the ingester crashed. I am wondering if I should run individual processes and hope that I can catch it crashing, or is there a better way?
This means loki is trying to shutdown, still I don’t see any reason. Are you running more than a single replica? You should not.
@cyriltovena I realize that a single process does not provide HA or fault tolerance. My idea was that for lab purposes and before I go full-on and provision a bunch of infrastructure, I run a single instance of it to test and evaluate it. Are you suggesting a single instance is not at all possible?
This is is the full cycle — there is nothing though why the ingester goes away in the first place. Which is worrying?
Error is elavated:
Jan 22 01:22:01 foo loki[21569]: level=warn ts=2020-01-22T00:22:01.911696611Z caller=logging.go:49 traceID=2f533f5aa6b6e9ed msg="POST /loki/api/v1/push (500) 363.278µs Response: \"at least 1 live ingesters required, could only find 0\\n\" ws: false; Content-Length: 8680; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Scope-Orgid: till; "
Apparently something is gone:
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.171545225Z caller=mock.go:113 msg=Get key=collectors/ring wait_index=0
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.171619654Z caller=mock.go:159 msg=Get key=collectors/ring modify_index=1858 value="\"\\xb5*\\xf0F\\n<\\n foo\""
Jan 22 01:22:02 foo loki[21569]: level=error ts=2020-01-22T00:22:02.171713361Z caller=transfer.go:186 msg="transfer failed" err="cannot find ingester to transfer chunks to: no pending ingesters"
Jan 22 01:22:02 foo loki[21569]: level=error ts=2020-01-22T00:22:02.171739512Z caller=lifecycler.go:659 msg="Failed to transfer chunks to another ingester" err="terminated after 10 retries"
Wait, I thought it was gone already — why would it exit now?
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.171843557Z caller=flush.go:163 msg="Ingester.flushLoop() exited"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.171888052Z caller=flush.go:163 msg="Ingester.flushLoop() exited"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.171887226Z caller=flush.go:163 msg="Ingester.flushLoop() exited"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.171905281Z caller=flush.go:163 msg="Ingester.flushLoop() exited"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.171902082Z caller=flush.go:174 msg="flushing stream" userid=till fp=e0e548bc2b7dd371 immediate=true
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.171931451Z caller=flush.go:163 msg="Ingester.flushLoop() exited"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.17194257Z caller=flush.go:174 msg="flushing stream" userid=till fp=9b161b1caa9875d9 immediate=true
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.171946459Z caller=flush.go:163 msg="Ingester.flushLoop() exited"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.171989281Z caller=flush.go:163 msg="Ingester.flushLoop() exited"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.172007961Z caller=flush.go:163 msg="Ingester.flushLoop() exited"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.172028904Z caller=flush.go:163 msg="Ingester.flushLoop() exited"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.172043288Z caller=flush.go:163 msg="Ingester.flushLoop() exited"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.172082603Z caller=flush.go:174 msg="flushing stream" userid=till fp=f4e834568b8b9fc7 immediate=true
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.17367119Z caller=flush.go:163 msg="Ingester.flushLoop() exited"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.173721679Z caller=flush.go:174 msg="flushing stream" userid=till fp=2fcf8ee929f7d34c immediate=true
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.17461308Z caller=flush.go:174 msg="flushing stream" userid=till fp=16c90c4c2a48fa44 immediate=true
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.171923408Z caller=flush.go:174 msg="flushing stream" userid=till fp=5576da15c4107576 immediate=true
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.178018353Z caller=flush.go:163 msg="Ingester.flushLoop() exited"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.178610299Z caller=flush.go:163 msg="Ingester.flushLoop() exited"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.179052352Z caller=flush.go:163 msg="Ingester.flushLoop() exited"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.180154609Z caller=flush.go:163 msg="Ingester.flushLoop() exited"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.180909522Z caller=flush.go:174 msg="flushing stream" userid=till fp=3243a09267d841f1 immediate=true
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.180944815Z caller=flush.go:163 msg="Ingester.flushLoop() exited"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.181008489Z caller=lifecycler.go:678 msg="unregistering member from ring"
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.181049839Z caller=mock.go:113 msg=Get key=collectors/ring wait_index=0
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.181075215Z caller=mock.go:159 msg=Get key=collectors/ring modify_index=1858 value="\"\\xb5*\\xf0F\\n<\\n foo\""
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.18114192Z caller=mock.go:86 msg=CAS key=collectors/ring modify_index=1858 value="\"\\x00\""
Now it's removing it from Consul (which I definitely haven't installed):
Jan 22 01:22:02 foo loki[21569]: level=info ts=2020-01-22T00:22:02.181171345Z caller=lifecycler.go:438 msg="ingester removed from consul"
Jan 22 01:22:02 foo loki[21569]: level=info ts=2020-01-22T00:22:02.181188953Z caller=lifecycler.go:326 msg="member.loop() exited gracefully"
Jan 22 01:22:02 foo loki[21569]: level=info ts=2020-01-22T00:22:02.181205763Z caller=loki.go:157 msg=stopping module=querier
Jan 22 01:22:02 foo loki[21569]: level=info ts=2020-01-22T00:22:02.181223102Z caller=loki.go:157 msg=stopping module=ring
Jan 22 01:22:02 foo loki[21569]: level=info ts=2020-01-22T00:22:02.181234791Z caller=loki.go:157 msg=stopping module=store
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.181243658Z caller=mock.go:159 msg=Get key=collectors/ring modify_index=1859 value="\"\\x00\""
Jan 22 01:22:02 foo loki[21569]: level=info ts=2020-01-22T00:22:02.181279468Z caller=loki.go:157 msg=stopping module=table-manager
Jan 22 01:22:02 foo loki[21569]: level=debug ts=2020-01-22T00:22:02.18128541Z caller=mock.go:113 msg=Get key=collectors/ring wait_index=1859
Jan 22 01:22:02 foo loki[21569]: level=info ts=2020-01-22T00:22:02.181300445Z caller=loki.go:157 msg=stopping module=overrides
Jan 22 01:22:02 foo loki[21569]: level=info ts=2020-01-22T00:22:02.18131291Z caller=loki.go:157 msg=stopping module=server
This is when I get an alert, and it resolves:
Jan 22 01:22:02 foo systemd[1]: Stopped Loki Log Aggregator.
Jan 22 01:22:02 foo systemd[1]: Started Loki Log Aggregator.
All those logs still indicate Loki is shutting down. How are you starting Loki?
I’m just telling you scaling the single binary does not work. It’s like Prometheus. The micro service can be scale.
How is you config ? How are you hosting it ? anything on this host that would suggest request for termination?
@slim-bean I am starting it using systemd
:
[Unit]
Description=Loki Log Aggregator
Documentation=https://github.com/grafana/loki
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/loki -config.file /etc/loki/loki.yml
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
@CyrilPeponnet it's running in a "Container" (~similar to OpenVZ) on a HV. The HV is running Virtuozzo7. The container/VM has no resource issues — according to Prometheus/node-exporter. :-)
My config is on the top of this thread. I've adjusted two things since I started this ticket: log_level
(a couple times) and I moved everything to /var/lib/loki
(vs. /tmp
).
The config looks good. I'm pretty sure Loki is getting signaled to shutdown by the host. Are you using some sort of healthiness probe ?
@cyriltovena no. Any idea where I would determine this?
Hmm, I see what you mean. When I search the previous minute I see that systemd is shutting it down, but I am not sure why. (Still looking.)
The log from the previous minute:
Jan 22 01:21:29 foo loki[21569]: level=debug ts=2020-01-22T00:21:29.3700528Z caller=grpc_logging.go:40 method=/logproto.Pusher/Push duration=96.679µs msg="gRPC (success)"
Jan 22 01:21:29 foo loki[21569]: level=debug ts=2020-01-22T00:21:29.370292003Z caller=logging.go:44 traceID=232645115b5ac6a9 msg="POST /loki/api/v1/push (204) 1.286727ms"
Jan 22 01:21:29 foo loki[21569]: level=debug ts=2020-01-22T00:21:29.863100262Z caller=flush.go:174 msg="flushing stream" userid=till fp=3243a09267d841f1 immediate=false
Jan 22 01:21:29 foo loki[21569]: level=debug ts=2020-01-22T00:21:29.863174955Z caller=flush.go:174 msg="flushing stream" userid=till fp=16c90c4c2a48fa44 immediate=false
Jan 22 01:21:29 foo loki[21569]: level=debug ts=2020-01-22T00:21:29.863684708Z caller=mock.go:113 msg=Get key=collectors/ring wait_index=0
Jan 22 01:21:29 foo loki[21569]: level=debug ts=2020-01-22T00:21:29.863719211Z caller=mock.go:159 msg=Get key=collectors/ring modify_index=1850 value="\"\\xb3*\\xf0C\\n:\\n foo\""
Jan 22 01:21:29 foo loki[21569]: level=debug ts=2020-01-22T00:21:29.863792125Z caller=mock.go:86 msg=CAS key=collectors/ring modify_index=1850 value="\"\\xb3*\\xf0C\\n:\\n foo\""
Jan 22 01:21:29 foo loki[21569]: level=debug ts=2020-01-22T00:21:29.863816738Z caller=mock.go:159 msg=Get key=collectors/ring modify_index=1851 value="\"\\xb3*\\xf0C\\n:\\n foo\""
Jan 22 01:21:29 foo loki[21569]: level=debug ts=2020-01-22T00:21:29.863855084Z caller=mock.go:113 msg=Get key=collectors/ring wait_index=1851
Jan 22 01:21:30 foo loki[21569]: level=debug ts=2020-01-22T00:21:30.469404289Z caller=grpc_logging.go:40 method=/logproto.Pusher/Push duration=83.895µs msg="gRPC (success)"
Jan 22 01:21:30 foo loki[21569]: level=debug ts=2020-01-22T00:21:30.469684911Z caller=logging.go:44 traceID=20cfc85696fc782a msg="POST /loki/api/v1/push (204) 1.047569ms"
Jan 22 01:21:31 foo loki[21569]: level=debug ts=2020-01-22T00:21:31.569215127Z caller=grpc_logging.go:40 method=/logproto.Pusher/Push duration=51.141µs msg="gRPC (success)"
Jan 22 01:21:31 foo loki[21569]: level=debug ts=2020-01-22T00:21:31.569476589Z caller=logging.go:44 traceID=2bc48903934767b0 msg="POST /loki/api/v1/push (204) 816.223µs"
Jan 22 01:21:32 foo loki[21569]: level=debug ts=2020-01-22T00:21:32.669291353Z caller=grpc_logging.go:40 method=/logproto.Pusher/Push duration=65.635µs msg="gRPC (success)"
Jan 22 01:21:32 foo loki[21569]: level=debug ts=2020-01-22T00:21:32.669562623Z caller=logging.go:44 traceID=ea9f2fc2e36f737 msg="POST /loki/api/v1/push (204) 938.221µs"
Jan 22 01:21:33 foo loki[21569]: level=debug ts=2020-01-22T00:21:33.760008313Z caller=logging.go:44 traceID=6a855fd81a7ad361 msg="GET /metrics (200) 3.215056ms"
Jan 22 01:21:33 foo loki[21569]: level=debug ts=2020-01-22T00:21:33.76910656Z caller=grpc_logging.go:40 method=/logproto.Pusher/Push duration=54.53µs msg="gRPC (success)"
Jan 22 01:21:33 foo loki[21569]: level=debug ts=2020-01-22T00:21:33.769350935Z caller=logging.go:44 traceID=44458dd1f78308c5 msg="POST /loki/api/v1/push (204) 768.753µs"
Jan 22 01:21:34 foo loki[21569]: level=debug ts=2020-01-22T00:21:34.863502845Z caller=mock.go:113 msg=Get key=collectors/ring wait_index=0
Jan 22 01:21:34 foo loki[21569]: level=debug ts=2020-01-22T00:21:34.863580091Z caller=mock.go:159 msg=Get key=collectors/ring modify_index=1851 value="\"\\xb3*\\xf0C\\n:\\n foo\""
Jan 22 01:21:34 foo loki[21569]: level=debug ts=2020-01-22T00:21:34.863688323Z caller=mock.go:86 msg=CAS key=collectors/ring modify_index=1851 value="\"\\xb3*\\xf0C\\n:\\n foo\""
Jan 22 01:21:34 foo loki[21569]: level=debug ts=2020-01-22T00:21:34.863734565Z caller=mock.go:159 msg=Get key=collectors/ring modify_index=1852 value="\"\\xb3*\\xf0C\\n:\\n foo\""
Jan 22 01:21:34 foo loki[21569]: level=debug ts=2020-01-22T00:21:34.863823658Z caller=mock.go:113 msg=Get key=collectors/ring wait_index=1852
Jan 22 01:21:34 foo loki[21569]: level=debug ts=2020-01-22T00:21:34.869401788Z caller=grpc_logging.go:40 method=/logproto.Pusher/Push duration=70.758µs msg="gRPC (success)"
Jan 22 01:21:34 foo loki[21569]: level=debug ts=2020-01-22T00:21:34.869682946Z caller=logging.go:44 traceID=7ef5e3df22fb15cf msg="POST /loki/api/v1/push (204) 1.087953ms"
Jan 22 01:21:35 foo loki[21569]: level=debug ts=2020-01-22T00:21:35.969284945Z caller=grpc_logging.go:40 duration=79.35µs method=/logproto.Pusher/Push msg="gRPC (success)"
Jan 22 01:21:35 foo loki[21569]: level=debug ts=2020-01-22T00:21:35.969550765Z caller=logging.go:44 traceID=3e65ae166b69d279 msg="POST /loki/api/v1/push (204) 905.004µs"
Jan 22 01:21:37 foo loki[21569]: level=debug ts=2020-01-22T00:21:37.06967791Z caller=grpc_logging.go:40 method=/logproto.Pusher/Push duration=84.702µs msg="gRPC (success)"
Jan 22 01:21:37 foo loki[21569]: level=debug ts=2020-01-22T00:21:37.069981025Z caller=logging.go:44 traceID=201567dde696c236 msg="POST /loki/api/v1/push (204) 1.196082ms"
Jan 22 01:21:38 foo loki[21569]: level=debug ts=2020-01-22T00:21:38.169341267Z caller=grpc_logging.go:40 method=/logproto.Pusher/Push duration=66.647µs msg="gRPC (success)"
Jan 22 01:21:38 foo loki[21569]: level=debug ts=2020-01-22T00:21:38.169623348Z caller=logging.go:44 traceID=ab341dc1311ac9b msg="POST /loki/api/v1/push (204) 921.206µs"
Jan 22 01:21:39 foo loki[21569]: level=debug ts=2020-01-22T00:21:39.269324878Z caller=grpc_logging.go:40 method=/logproto.Pusher/Push duration=78.049µs msg="gRPC (success)"
Jan 22 01:21:39 foo loki[21569]: level=debug ts=2020-01-22T00:21:39.269529875Z caller=logging.go:44 traceID=38aeb93c3e63db2a msg="POST /loki/api/v1/push (204) 861.822µs"
Jan 22 01:21:39 foo systemd[1]: Stopping Loki Log Aggregator...
Jan 22 01:21:39 foo loki[21569]: level=info ts=2020-01-22T00:21:39.503905379Z caller=signals.go:54 msg="=== received SIGINT/SIGTERM ===\n*** exiting"
Jan 22 01:21:39 foo loki[21569]: level=info ts=2020-01-22T00:21:39.503997096Z caller=loki.go:175 msg="notifying module about stopping" module=all
Jan 22 01:21:39 foo loki[21569]: level=info ts=2020-01-22T00:21:39.504037402Z caller=loki.go:175 msg="notifying module about stopping" module=distributor
Jan 22 01:21:39 foo loki[21569]: level=info ts=2020-01-22T00:21:39.504053725Z caller=loki.go:175 msg="notifying module about stopping" module=ingester
Jan 22 01:21:39 foo loki[21569]: level=info ts=2020-01-22T00:21:39.504074298Z caller=loki.go:175 msg="notifying module about stopping" module=querier
Jan 22 01:21:39 foo loki[21569]: level=info ts=2020-01-22T00:21:39.504089073Z caller=loki.go:175 msg="notifying module about stopping" module=ring
Jan 22 01:21:39 foo loki[21569]: level=info ts=2020-01-22T00:21:39.504101715Z caller=loki.go:175 msg="notifying module about stopping" module=store
Jan 22 01:21:39 foo loki[21569]: level=info ts=2020-01-22T00:21:39.504111054Z caller=loki.go:175 msg="notifying module about stopping" module=table-manager
Jan 22 01:21:39 foo loki[21569]: level=info ts=2020-01-22T00:21:39.504125108Z caller=loki.go:175 msg="notifying module about stopping" module=overrides
Jan 22 01:21:39 foo loki[21569]: level=info ts=2020-01-22T00:21:39.504139301Z caller=loki.go:175 msg="notifying module about stopping" module=server
Jan 22 01:21:39 foo loki[21569]: level=info ts=2020-01-22T00:21:39.504152862Z caller=loki.go:157 msg=stopping module=all
Jan 22 01:21:39 foo loki[21569]: level=info ts=2020-01-22T00:21:39.504172896Z caller=loki.go:157 msg=stopping module=distributor
Jan 22 01:21:39 foo loki[21569]: level=info ts=2020-01-22T00:21:39.504189606Z caller=loki.go:157 msg=stopping module=ingester
Jan 22 01:21:39 foo loki[21569]: level=info ts=2020-01-22T00:21:39.504213366Z caller=lifecycler.go:632 msg="changing ingester state from" old_state=ACTIVE new_state=LEAVING
Nothing. I'll deploy with Environment=SYSTEMD_LOG_LEVEL=debug
hoping to see more.
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
I can confirm that this was something related to the HV we are running on and the virtualisation technology we use. The system effectively killed everything doing a lot of i/o. Closing this. Thanks for the help.
Thanks for the follow up! Glad you were able to figure out the cause!
Describe the bug
Loki & promtail daemons crash eventually.
To Reproduce
Steps to reproduce the behavior:
I have a very basic setup — everything is stored on
localhost
.When I look at the logs of loki, I see lots of messages about out of order events. I believe this happens as I currently have persistence configured for
journald
(to/var/log/journal
) and whenever it starts up, it tries to ingest the entire directory, not just the most recent/current journal.Another thing I noticed is that eventually the ingester is gone. But I haven't been able to track as to why this happened. Maybe the amount of journal is the problem? It's roughly 700 MB, the VM has plenty of RAM (8 GiB) and 4 vCPUs.
Promtail crashes eventually as well, after — I think — too many "400" errors from loki, which I guess is related, but also undesired behaviour. Should I make another ticket for that?
Expected behavior
I expect them to be running and working.
Environment:
CentOS Linux release 7.7.1908 (Core)
, kernel:3.10.0
)Screenshots, Promtail config, or terminal output
Loki config:
Promtail config: