Loki (and logcli) should support streaming logs

grafana / loki

Like Prometheus, but for logs.

https://grafana.com/loki

GNU Affero General Public License v3.0

23.78k stars 3.43k forks source link

Loki (and logcli) should support streaming logs #943

Open atombender opened 5 years ago

atombender commented 5 years ago

Describe the bug While the apparent runaway memory use problem in Loki itself seems to have been fixed, logcli itself is also using too much memory.

To Reproduce Steps to reproduce the behavior:

Start a logcli container with a maximum RAM set to, say, 100MB.
logcli query --limit=10000000 --since=1h <some query that produces lots of log output>. (In my test case, this is about 100MB of log data.)
Watch logcli be OOMKilled.

Expected behavior logcli should stream its output, and never run out of memory, or even use particularly large amounts of memory.

pracucci commented 5 years ago

Thanks @atombender for your report.

The logcli query (without --tail) command currently loads the entire result set into memory in order to analyze all of the log entries and find common labels before starting printing the entries (common labels are filtered out while printing). This means that the memory usage won't be lower than the entire set of log entries printed out.

I personally don't see much room for improvements while keeping the common labels filtering. On the contrary, a more optimized version of the query command may be implemented disabling the common labels filtering (ie. allowing to enable/disable it via cli flag).

What's your take @cyriltovena @slim-bean ? Is it worth the effort, at this stage? Easier / better ideas?

atombender commented 5 years ago

Thanks. Why is this filtering needed? Personally, I just want to filter the log entries and see them as they were recorded, and I don't care about whether printing filters out common labels. I need to be able to grab gigabytes of log data and I need this to take just a few seconds, modulo bandwidth, with minimal memory usage. Is the filtering disabled if I pass --no-labels, perhaps?

cyriltovena commented 5 years ago

I think we should support the use case of downloading raw logs, which we should using -o raw in that case we should not do any processing.

pracucci commented 5 years ago

I think we should support the use case of downloading raw logs, which we should using -o raw in that case we should not do any processing.

@cyriltovena Do we wanna keep timestamp ordering and direction support with -o raw? If yes, we need to parse the entire JSON response anyway before printing out the log entries.

An approach I'm thinking about, is:

Allow to disable the common labels filtering (whatever is the output format)
When common labels filtering is disabled, we switch the implementation to a query pager, which - whatever is the --limit - reduces it to an hardcoded max and iterates over multiple requests adjusting the from/to timestamp accordingly

Makes sense?

cyriltovena commented 5 years ago

Being able to pipe to a file the raw logs seems very interesting and useful so I think we should definitely support —raw. Tail should also support it.

Would websocket be a good solution here, or could the current tail endpoint be improved to support past queries and stream results by batches.

But I think we could start with a client solution like you’ve described see how it goes and move to the backend if required.

Should we also explore a simpler alias for the raw tail query command ?

logcli logs

Would be useful specially if we find a way to bash complete label names and values with fuzzy search.

Finally I think there is a way to detect if the stout is a tty and if so we should default with labels and without labels for file but still support flags override.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

slim-bean commented 5 years ago

I think this should stay open, come back later stalebot

On Sat, Oct 12, 2019, 10:20 PM stale[bot] notifications@github.com wrote:

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/grafana/loki/issues/943?email_source=notifications&email_token=ACO2RK3WPAII3IO6ZPFI6ELQOKAYDA5CNFSM4IQL4ABKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBCMTJY#issuecomment-541378983, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACO2RKYJH6YB36EMUGQH5PLQOKAYDANCNFSM4IQL4ABA .