grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.83k stars 3.44k forks source link

Unexpected too far behind errors #5936

Closed moray95 closed 2 years ago

moray95 commented 2 years ago

Loki rejects logs with "too far behind" errors when reject_old_samples is disabled and logs are no older than reject_old_samples_max_age.

Configuration:

 limits_config:
  reject_old_samples: false
  reject_old_samples_max_age: 168h

Steps to reproduce the behavior: I am not sure how to reproduce the issue but have noticed it mostly appears after a downtime either on Loki itself or the backing object storage.

Expected behavior After setting reject_old_samples to false, the logs should never be rejected with "entry too far behind" errors, and especially not when logs are no older than the configured reject_old_samples_max_age.

Environment:

Screenshots, Promtail config, or terminal output

level=warn ts=2022-04-15T09:25:11.8678788Z caller=grpc_logging.go:38 duration=498.136µs method=/logproto.Pusher/Push err="rpc error: code = Code(400) desc = entry with timestamp 2022-04-15 03:20:00.80086913 +0000 UTC ignored, reason: 'entry too far behind' for stream: {REDACTED},
entry with timestamp 2022-04-15 03:20:00.883413328 +0000 UTC ignored, reason: 'entry too far behind' for stream: {REDACTED},
entry with timestamp 2022-04-15 03:20:04.550525553 +0000 UTC ignored, reason: 'entry too far behind' for stream: {REDACTED},
entry with timestamp 2022-04-15 03:20:04.936669923 +0000 UTC ignored, reason: 'entry too far behind' for stream: {REDACTED},
entry with timestamp 2022-04-15 03:20:05.402723364 +0000 UTC ignored, reason: 'entry too far behind' for stream: {REDACTED},
entry with timestamp 2022-04-15 03:20:05.977017186 +0000 UTC ignored, reason: 'entry too far behind' for stream: {REDACTED},
entry with timestamp 2022-04-15 03:20:07.235240884 +0000 UTC ignored, reason: 'entry too far behind' for stream: {REDACTED},
entry with timestamp 2022-04-15 03:20:07.90163636 +0000 UTC ignored, reason: 'entry too far behind' for stream: {REDACTED},
entry with timestamp 2022-04-15 03:20:09.9954798 +0000 UTC ignored, reason: 'entry too far behind' for stream: {REDACTED},
entry with timestamp 2022-04-15 03:20:10.159147785 +0000 UTC ignored, reason: 'entry too far behind' for stream: {REDACTED},
total ignored: 926 out of 926" msg="gRPC
"
slim-bean commented 2 years ago

ordering of samples is always per stream, and for a given stream Loki will accept out of order samples for at most max_chunk_age/2 behind the most recent entry sent. This is a little confusing so I'll try to post some examples.

For a stream {job="foo",app="bar"} and a default max_chunk_age: 2h if you send a log to that stream with a timestamp of 2022-04-19T12:00:00Z, Loki will accept logs as far back as 2022-04-19T11:00:00Z. Anything older than that will be rejected with entry too far behind

It's important to note though that Loki will always accept much older samples, it's always the relationship to the most recent received timestamp for a given stream.

If you want to load older logs into Loki, you need to do so in a roughly chronological order, you can playback logs from years ago as long as you start with the older logs and work your way forward. Example, today is 2022-04-19, as long as you have reject_old_samples: false you can send a log with timestamp 2018-01-01T13:00:00Z, the same rules apply though, if that's the most recent log for the stream you can only send older logs that have a time newer than 2018-01-01T12:00:00Z (you can send anything newer)

Make sure you have a schema entry that has a from date older than the oldest logs you want to send.

moray95 commented 2 years ago

Thanks a lot for the detailed explanation @slim-bean ! Didn't know max_chunk_age had such a use case. I'm closing this issue for now, will check back after fine-tuning our configuration according to the new info.

slim-bean commented 2 years ago

Welcome! I would comment that increasing max chunk age to work around this will have some limits and perhaps unintended consequences depending on how big you want to go. That is, if you wanted to make it more than 24 hours I don't know if that may cause you query problems? We've never tried it.

Another consideration would be adding another label to the older logs to send them as another stream.

Good luck!

mhoyer commented 2 years ago

Still to me it's not 100% clear how both config parameters are interacting. Is it A or B?

A: Is reject_old_samples=true ignoring reject_old_samples_max_age and rejecting everything? B: Is reject_old_samples=true only rejecting when reject_old_samples_max_age is fulfilled?

Maybe worth a clarification also in the configuration section here? https://grafana.com/docs/loki/latest/configuration/#limits_config

mhoyer commented 2 years ago

I might found the answer (B) already in the sources? https://github.com/grafana/loki/blob/main/pkg/distributor/validator.go#L68

Can you confirm? Thus, reject_old_samples=false is ignoring reject_old_samples_max_age completely, right?

slim-bean commented 2 years ago

Can you confirm? Thus, reject_old_samples=false is ignoring reject_old_samples_max_age completely, right?

This is correct, these two configurations are a little bit legacy, when Loki needed a separate index store like dynamodb it was desirable to limit how far back you could ingest data because there was code that would reduce the provisioned write throughput on old tables to save cost.

dorinand commented 2 years ago

Hello, I have similar issue. This is my limit configuration:

limits_config:
  # 2 weeks
  retention_period: 336h
  reject_old_samples: false
  reject_old_samples_max_age: 43800h

I set reject old samples max age to 5 years by this configuration (max_chunk_age is default 2h) and also ignore rejecting old samples. I still getting entry too far behind error for logs old only few hours. I would expect from the configuration, I will be able to import logs from 5y ago.

So how does this work? Should I also configure max_chunk_age? I sending logs to Loki with FIlebeat (Filebeat -> Logstash -> Loki). Maybe Filebeat and Logstash change the order of the logs?

Loki 2.5.0

chandramohan97911 commented 1 year ago

@dorinand I am also facing same issue with promtail while scraping the instance logs to loki. Please let me know if you fix the entry too far behind issue and could share working config.

thanks, Chandra

charles-horel commented 1 year ago

I am having the same issue with fluentd: 2023-03-24 20:17:12 +0000 [warn]: #0 [clusterflow:logging:loki-flow:clusteroutput:logging:loki-output] failed to write post to http://loki-gateway.loki/loki/api/v1/push (400 Bad Request entry with timestamp 2023-03-24 08:10:52.973597146 +0000 UTC ignored, reason: 'entry too far behind, oldest acceptable timestamp is: 2023-03-24T18:40:47Z' for stream: ....

We have changed the following loki configs and they are not making a difference:

loki:
  limits_config:
    reject_old_samples: false

  ingester:
    max_chunk_age: 24h
waney316 commented 10 months ago

Anyone slove the problem? I am having the same issue.