Closed holmbaeck closed 4 years ago
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
Did you ever figure this out? I have the same issue in 1.5
@brandonstark Unfortunately not, we haven't had time to really dig into it. Hopefully we can do something within the next couple of weeks. We are still a few months from production, but if it is not fixed, then fallback will be EFK stack
Have you tried to look at what contains the volume (PVC) ? if it's getting erased somehow ?
@cyriltovena, since the other loglines stay and only log lines with parsed fields larger tan 3-4k gets erased, I do not see how it can be a PVC issue. My guess is that it is a limitation in parsed fields, on the receiving end of the ingester's flush.
Have you tried logcli? It may be related to grafana not handling correctly very long line.
Ho sorry didn’t see the response in the network tab.
@cyriltovena Same result with logcli. Logline is shown for 5m minutes while in the ingester, then disappears. If I change the ingester config to retain chunks for longer, the lines will be searchable in grafana during "chunk_idle_period" time `
ingester:
chunk_idle_period: 1h
max_chunk_age: 1h
chunk_block_size: 1048576
chunk_retain_period: 30s
max_transfer_retries: 0
Above will make the lines available for 1 hour. I have also bumped to loki v1.5.0, and it has same result
Logcli stats
Ingester.TotalReached 1
Ingester.TotalChunksMatched 3
Ingester.TotalBatches 0
Ingester.TotalLinesSent 0
Ingester.HeadChunkBytes 18 kB
Ingester.HeadChunkLines 3
Ingester.DecompressedBytes 0 B
Ingester.DecompressedLines 0
Ingester.CompressedBytes 0 B
Ingester.TotalDuplicates 0
Store.TotalChunksRef 7
Store.TotalChunksDownloaded 7
Store.ChunksDownloadTime 736.857µs
Store.HeadChunkBytes 0 B
Store.HeadChunkLines 0
Store.DecompressedBytes 14 kB
Store.DecompressedLines 6
Store.CompressedBytes 7.4 kB
Store.TotalDuplicates 0
Summary.BytesProcessedPerSecond 13 MB
Summary.LinesProcessedPerSecond 3679
Summary.TotalBytesProcessed 31 kB
Summary.TotalLinesProcessed 9
Summary.ExecTime 2.446084ms
--- After 5 min
Ingester.TotalReached 1
Ingester.TotalChunksMatched 3
Ingester.TotalBatches 0
Ingester.TotalLinesSent 0
Ingester.HeadChunkBytes 0 B
Ingester.HeadChunkLines 0
Ingester.DecompressedBytes 1.6 kB
Ingester.DecompressedLines 2
Ingester.CompressedBytes 3.8 kB
Ingester.TotalDuplicates 0
Store.TotalChunksRef 10
Store.TotalChunksDownloaded 10
Store.ChunksDownloadTime 746.251µs
Store.HeadChunkBytes 0 B
Store.HeadChunkLines 0
Store.DecompressedBytes 15 kB
Store.DecompressedLines 8
Store.CompressedBytes 11 kB
Store.TotalDuplicates 2
Summary.BytesProcessedPerSecond 5.7 MB
Summary.LinesProcessedPerSecond 3429
Summary.TotalBytesProcessed 17 kB
Summary.TotalLinesProcessed 10
Summary.ExecTime 2.915532ms
Ok that’s good, can you activate debug log and check if you have any interesting log when flushing chunks ? Btw SIGTERM will force flush.
Also can you keep only lvl, app and logger as labels the rest is easily searchable with logql and will force a chunk per request which we don’t want.
@cyriltovena Unfortunately there is no interesting logs when doing debug. loki-debug.log
Ok Could you send me a chunk where a line is supposed to be there and you don't see it, which line it is and I'll check if it has been flushed just to narrow down the issue.
We have an inspect chunk too in ./cmd you could use too if you want to push this further. see https://github.com/grafana/loki/tree/master/cmd/chunks-inspect
@cyriltovena nice tool, I found the chunk, so it is getting stored, but doesn't show up in search Chunk attached the-missing-chunk.zip
For reference, these two chunks are created at the same time (during same request) by same service and are searchable. small-1.zip small-2.zip
Thanks for this do you how big the line is ? I'll take a look soon at this chunk and try to reproduce and hopefully get a fixed for it.
@cyriltovena FYI, as per your suggestion I have reduced the amount of labels, and now the "missing" logline can be found
promtail config
pipelineStages:
- labels:
app:
application_name:
level:
logger_name:
removed labels
response_code:
request_path:
response_time:
request_method:
application_name:
trace_id:
span_id:
jaeger_link:
Possibly you were hitting a limit on the number of files from the filesystem. What's weird is that no log,nor errors was showing up that's annoying.
Those labels are bad because they cause each request to create a file with a single line inside. So you endup with millions of tiny tiny chunks. We designed Loki to hold at least 10k to 100k line per chunk.
Since I can find the line in the chunks, the problem must be in the search part of Loki, not storage. Maybe because we make so many tiny chunks, is it being skipped somehow (lines in several closely related chunks being seen as duplicate and therefore discarded?). After removing the unique labels, we have not noticed any lines getting "disappearing". Maybe we can just call this an abnormal use of labels, that is discouraged.
@cyriltovena, thank you for the effort, until I see another "disappearing" line, I will consider it resolved. Hopefully you can find something helpful in the attached chunks.
You mentioned that "rest is easily searchable with logql", I can only se that it can be done via regex, or it is possible to regerence "parsed fields" directly?
Regex and and string contains (|=), Loki excel with those. For parsed fields we have plan for adding this to the language, we have a GrafanaCon talk giving a glimpse about this. See https://grafana.com/go/grafanaconline/loki-future/ near the end.
Describe the bug I have issue with both 1.4.0 and 1.4.1. For me it seems like an issue with either too long logline, or to large parsed fields.
I can see the lines for 5 minutes. I guess this is when the data is in the ingester before flush to DB. But after 5m the line disappear. There are no errors in the Loki pod logs.
Is this a problem because the "large" json field gets stored in a parsed field, and if so, can we remedy it somehow?
To Reproduce Steps to reproduce the behavior:
Expected behavior Get all log lines
call again after 5 minutes still get all loglines -- but we get this
Environment:
Screenshots, Promtail config, or terminal output Loki-stack values
logs are picked up immediately The logline has a 12k json field (debug output), if I reduce data in that field the logline stays after flush 5 minutes gone - same request - no data No data :(
loki-log