Describe the bug
I recently upgraded from Loki Helm Chart 5.8.9 to 5.15.0 (this is of course not good practice but the CHANGELOG.md did not indicate any breaking changes for us).
Since then we have seen a large number of logs being "discarded" from one of our tenants. This tenant can get up to roughly 3.5 million logs every minute. But it only bursts up to that throughout the middle of the day. The version we were originally on handled this perfectly. However since the upgrade the tenant reliably shows the loki_discarded_samples_total metric as over 1/3 of the logs being collected.
We have other tenants in this Loki deployment but they only ingest 1 million lines combined.
To Reproduce
Steps to reproduce the behavior:
Start Loki on a kubernetes cluster with a helm chart in simple scalable mode on version 5.15.0 (with 6 readers, 9 writers, 6 backend, 3 gateway)
Ingest 3.5 millions logs per minute from a data source using a Vector agent running in stateless aggregator mode
Check the discarded metrics from Prometheus
Expected behavior
We expect that Loki (possibly with some scaling) will handle this number of logs. As it was described as being able to handle this number of logs. Especially when some of the examples listed are over 1TB of logs ingested a day. We get to roughly 300-400GB a day uncompressed.
Environment:
Infrastructure: Kubernetes
Deployment tool: Helm
Screenshots, Promtail config, or terminal output
CONFIG
Describe the bug I recently upgraded from Loki Helm Chart 5.8.9 to 5.15.0 (this is of course not good practice but the CHANGELOG.md did not indicate any breaking changes for us). Since then we have seen a large number of logs being "discarded" from one of our tenants. This tenant can get up to roughly 3.5 million logs every minute. But it only bursts up to that throughout the middle of the day. The version we were originally on handled this perfectly. However since the upgrade the tenant reliably shows the
loki_discarded_samples_total
metric as over 1/3 of the logs being collected. We have other tenants in this Loki deployment but they only ingest 1 million lines combined.To Reproduce Steps to reproduce the behavior:
Expected behavior We expect that Loki (possibly with some scaling) will handle this number of logs. As it was described as being able to handle this number of logs. Especially when some of the examples listed are over 1TB of logs ingested a day. We get to roughly 300-400GB a day uncompressed.
Environment:
Screenshots, Promtail config, or terminal output CONFIG