Describe the bug
After upgrading Loki from 2.3.0 to 2.8.1, I'm seeing 1-minute gaps in some of our graphs (including the logs volume graph in Grafana Explore) at regular intervals. I have confirmed that the interval of the gaps matches the value of the split_queries_by_interval setting. Please see the attached screenshots with this setting at 30m (the default) and 15m.
The logs do exist in Loki even for those minutes that are showing up as gaps in the graph, I can see the actual log entries in Explore.
I suspect this may be related to the fact that the service producing these particular logs only writes a bunch of log entries once per minute at the top of the minute, ie. 500+ log entries exactly at 2023-05-02 08:00:00, another 500+ entries exactly at 08:01:00, 08:02:00 etc. My guess is that this triggers some edge condition related to aggregations and how the results from the split queries are combined together.
To Reproduce
Steps to reproduce the behavior:
Start Loki (2.8.1, running in single binary mode)
Feed in logs according to the pattern mentioned above (a burst of log entries once per minute at the top of the minute)
Query for the logs in Grafana Explore: e.g. {type="log type"}
Observe a 1-minute gap in the logs volume graph every N minutes, where N equals the value of split_queries_by_interval.
Expected behavior
The graph should not have gaps every split_queries_by_interval minutes.
Environment:
Infrastructure: Single binary running in Docker on AWS EC2, BoltDB Shipper on S3
Describe the bug After upgrading Loki from 2.3.0 to 2.8.1, I'm seeing 1-minute gaps in some of our graphs (including the logs volume graph in Grafana Explore) at regular intervals. I have confirmed that the interval of the gaps matches the value of the
split_queries_by_interval
setting. Please see the attached screenshots with this setting at30m
(the default) and15m
.The logs do exist in Loki even for those minutes that are showing up as gaps in the graph, I can see the actual log entries in Explore.
I suspect this may be related to the fact that the service producing these particular logs only writes a bunch of log entries once per minute at the top of the minute, ie. 500+ log entries exactly at
2023-05-02 08:00:00
, another 500+ entries exactly at08:01:00
,08:02:00
etc. My guess is that this triggers some edge condition related to aggregations and how the results from the split queries are combined together.To Reproduce Steps to reproduce the behavior:
{type="log type"}
split_queries_by_interval
.Expected behavior The graph should not have gaps every
split_queries_by_interval
minutes.Environment:
Screenshots, Promtail config, or terminal output
split_queries_by_interval: 30m
split_queries_by_interval: 15m