Open honganan opened 2 years ago
I guess it's because the stream produces logs slow and the chunk
is not flushed when I query, and query_ingesters_within
configs to query from Ingester
only within recent 15 minutes. So the logs between now-2h
to now-15m
cannot be queried. I want to confirm am I alright?
If that's the problem, should we do some check to the config values and give some warning?
cc @sandeepsukhani
I guess it's because the stream produces logs slow and the
chunk
is not flushed when I query, andquery_ingesters_within
configs to query fromIngester
only within recent 15 minutes. So the logs betweennow-2h
tonow-15m
cannot be queried. I want to confirm am I alright?If that's the problem, should we do some check to the config values and give some warning?
Yes, you got it right. The query_ingesters_within
should be at least max_chunk_age
which defaults to 2h
. We should add some validation for this and similar other configs where Loki operators can do mistakes which can be avoided by checks.
Hi @sandeepsukhani ,
It seems I met the similar issue, I am running Loki in K8S with SSD mode, the max_chunk_age
is set as default 2h
and query_ingesters_within
is set as 3h
. I ingested a log snippet which was generated at March 1st :
2024-03-01 02:52:27,703+0800 jira-stats-0 INFO [c.a.j.util.stats.JiraStats] [JIRA-STATS] [NODE-START] snapshot stats: duration=PT1H0.007S, statsOverhead=n/a, data={"_statsName":"NODE-START","_statsType":"snapshot","_time":"2024-02-29T18:52:27.702Z","_timestamp":1709232747702,"_duration":"PT1H0.007S","_invocations":0,"_statsOverhead":"n/a","configuration":{},"checkIndexOnStart":{"result":"NOT_RUN","timeInSeconds":0},"getIndexBy":{"rebuildLocalIndex":{"result":"NOT_RUN","timeInSeconds":0},"pickIndexSnapshotFromSharedHome":{"result":"NOT_RUN","timeInSeconds":0},"requestIndexSnapshotFromAnotherNode":{"result":"NOT_RUN","timeInSeconds":0},"performFullForegroundReindex":{"result":"NOT_RUN","timeInSeconds":0}},"ensureFreshIndexSnapshot":{"result":"NOT_RUN","snapshotExisted":false,"snapshotCreated":false,"timeInSeconds":0}} 2024-03-01 02:52:27,704+0800 jira-stats-0 INFO [c.a.j.util.stats.JiraStats] [JIRA-STATS] [NODE-START] total stats: duration=PT205H6M19.18S, statsOverhead=n/a, data={"_statsName":"NODE-START","_statsType":"total","_time":"2024-02-29T18:52:27.702Z","_timestamp":1709232747702,"_duration":"PT205H6M19.18S","_invocations":0,"_statsOverhead":"n/a","configuration":{},"checkIndexOnStart":{"result":"NOT_RUN","timeInSeconds":0},"getIndexBy":{"rebuildLocalIndex":{"result":"NOT_RUN","timeInSeconds":0},"pickIndexSnapshotFromSharedHome":{"result":"NOT_RUN","timeInSeconds":0},"requestIndexSnapshotFromAnotherNode":{"result":"NOT_RUN","timeInSeconds":0},"performFullForegroundReindex":{"result":"NOT_RUN","timeInSeconds":0}},"ensureFreshIndexSnapshot":{"result":"NOT_RUN","snapshotExisted":false,"snapshotCreated":false,"timeInSeconds":0}} 2024-03-01 02:52:29,578+0800 plugin-transaction-0 INFO [c.a.jira.plugin.PluginTransactionListener] [JIRA-STATS] [plugin-transaction] numberStartEvents:627, numberEndEvents:627, numberSendEvents:278, numberEventsInTransactions:12196, numberOfPluginEnableEvents:276 2024-03-01 02:52:43,445+0800 Caesium-1-2 INFO ServiceRunner [c.a.j.issue.index.DefaultIndexManager] Start to check for index consistency. 2024-03-01 02:52:43,446+0800 Caesium-1-2 INFO ServiceRunner [c.a.j.issue.index.DefaultIndexManager] Index consistency check finished. indexConsistent: true; timeToCheckIndexConsistency: 0 ms 2024-03-01 02:52:43,446+0800 Caesium-1-2 INFO ServiceRunner [c.a.jira.servlet.ApplicationStateResolverImpl] Checking index consistency. Time taken: 1.343 ms
When I tried to query it from Grafana, it gave me the error like below:
But if I ingested the log snippet generated just now, everything looks normal:
2024-03-06 16:32:14,374+0800 Caesium-1-1 INFO ServiceRunner [c.a.j.issue.index.DefaultIndexManager] Start to check for index consistency. 2024-03-06 16:32:14,380+0800 Caesium-1-1 INFO ServiceRunner [c.a.j.issue.index.DefaultIndexManager] Index consistency check finished. indexConsistent: true; timeToCheckIndexConsistency: 2 ms 2024-03-06 16:32:14,380+0800 Caesium-1-1 INFO ServiceRunner [c.a.jira.servlet.ApplicationStateResolverImpl] Checking index consistency. Time taken: 5.981 ms 2024-03-06 16:33:14,375+0800 Caesium-1-4 INFO ServiceRunner [c.a.j.issue.index.DefaultIndexManager] Start to check for index consistency. 2024-03-06 16:33:14,379+0800 Caesium-1-4 INFO ServiceRunner [c.a.j.issue.index.DefaultIndexManager] Index consistency check finished. indexConsistent: true; timeToCheckIndexConsistency: 0 ms 2024-03-06 16:33:14,380+0800 Caesium-1-4 INFO ServiceRunner [c.a.jira.servlet.ApplicationStateResolverImpl] Checking index consistency. Time taken: 3.970 ms
Per the doc:
query_ingesters_within: Maximum lookback beyond which queries are not sent to ingester
So it means if the query time range (7 days) is out of the scope of query_ingesters_within(3h), Loki will not search ingesters at all but the backend storage. However, since it is still in the window of max_chunk_age (2h) or chunk_idle_period (30min) , the logs in memory is not flushed to storage yet, then I failed to search the logs from either memory or storage.
Once I set the query_ingesters_within
to 0, the elder log files are searchable again.
Please correct me if there is any misunderstanding.
Thanks.
So it means if the query time range (7 days) is out of the scope of query_ingesters_within(3h), Loki will not search ingesters at all but the backend storage. However, since it is still in the window of max_chunk_age (2h) or chunk_idle_period (30min) , the logs in memory is not flushed to storage yet, then I failed to search the logs from either memory or storage.
Once I set the
query_ingesters_within
to 0, the elder log files are searchable again.Please correct me if there is any misunderstanding.
The max_chunk_age
config is like a time window, any logs out of [now - max_chunk_age/2, now] will be discarded. This behavior can be observed via loki_discarded_samples_total
metrics.
So I think your problem is logs generated in March 1st is outside the window and be discarded, not the query config.
Describe the bug I have a log line which have several labels like: application=myapp, source=error.log, and it contains some keywords:
When choose only labels without pipeline, I can query it out. But when add pipeline with keyword contains in the log, the result is empty.
To Reproduce Steps to reproduce the behavior:
write
component with:querier
andquery-frontend
component with:{application="myapp",source="error.log"}
2nd:{application="myapp",source="error.log"} |= "SQLIntegrityConstraintViolationException"
Expected behavior Expect both 2 queries returns the log line, but only the 1st did. After about 2 hours letter, both of them can return the log line.
Environment: