Open atombender opened 5 years ago
Thanks @atombender for your report.
The logcli query
(without --tail
) command currently loads the entire result set into memory in order to analyze all of the log entries and find common labels before starting printing the entries (common labels are filtered out while printing). This means that the memory usage won't be lower than the entire set of log entries printed out.
I personally don't see much room for improvements while keeping the common labels filtering. On the contrary, a more optimized version of the query
command may be implemented disabling the common labels filtering (ie. allowing to enable/disable it via cli flag).
What's your take @cyriltovena @slim-bean ? Is it worth the effort, at this stage? Easier / better ideas?
Thanks. Why is this filtering needed? Personally, I just want to filter the log entries and see them as they were recorded, and I don't care about whether printing filters out common labels. I need to be able to grab gigabytes of log data and I need this to take just a few seconds, modulo bandwidth, with minimal memory usage. Is the filtering disabled if I pass --no-labels
, perhaps?
I think we should support the use case of downloading raw logs, which we should using -o raw
in that case we should not do any processing.
I think we should support the use case of downloading raw logs, which we should using -o raw in that case we should not do any processing.
@cyriltovena Do we wanna keep timestamp ordering and direction support with -o raw
? If yes, we need to parse the entire JSON response anyway before printing out the log entries.
An approach I'm thinking about, is:
--limit
- reduces it to an hardcoded max and iterates over multiple requests adjusting the from/to timestamp accordinglyMakes sense?
Being able to pipe to a file the raw logs seems very interesting and useful so I think we should definitely support —raw. Tail should also support it.
Would websocket be a good solution here, or could the current tail endpoint be improved to support past queries and stream results by batches.
But I think we could start with a client solution like you’ve described see how it goes and move to the backend if required.
Should we also explore a simpler alias for the raw tail query command ?
logcli logs
Would be useful specially if we find a way to bash complete label names and values with fuzzy search.
Finally I think there is a way to detect if the stout is a tty and if so we should default with labels and without labels for file but still support flags override.
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
I think this should stay open, come back later stalebot
On Sat, Oct 12, 2019, 10:20 PM stale[bot] notifications@github.com wrote:
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/grafana/loki/issues/943?email_source=notifications&email_token=ACO2RK3WPAII3IO6ZPFI6ELQOKAYDA5CNFSM4IQL4ABKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBCMTJY#issuecomment-541378983, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACO2RKYJH6YB36EMUGQH5PLQOKAYDANCNFSM4IQL4ABA .
Describe the bug While the apparent runaway memory use problem in Loki itself seems to have been fixed,
logcli
itself is also using too much memory.To Reproduce Steps to reproduce the behavior:
logcli query --limit=10000000 --since=1h <some query that produces lots of log output>
. (In my test case, this is about 100MB of log data.)Expected behavior
logcli
should stream its output, and never run out of memory, or even use particularly large amounts of memory.