Open MiGrandjean opened 1 year ago
Thanks for your report! Could you share a memory dump? You can grab one the following way:
-server.profiling_enabled
http://localhost:9080/debug/pprof/heap?duration=15s
since that is still using less than 150MiB maybe this isn't a real leak but anyway, always good to have a look at things :smile:
edit: ok looking at my own cluster looks like there's definetely a leak going on lol. thanks for the report.
Also seeing a "leaky" pattern and oom kills across our promtail daemonsets (promtail v2.8.3)
sum by (node) (container_memory_working_set_bytes{job="cadvisor/cadvisor", pod=~"promtail.+",container!=""})
We could correlate the leak with extremely long lines or absolutely no line breaks in certain pods logs. We experimented adding line breaks and the leak disappeared.
For reference, we are using max_line_size = 0 (no limit)
We could correlate the leak with extremely long lines or absolutely no line breaks in certain pods logs. We experimented adding line breaks and the leak disappeared.
Using max_line_size = 0
hi @grandich . you mean adding this lines ?
config:
snippets:
extraLimitsConfig: |
max_line_size: 0
max_line_size_truncate: true
or can you send example what you mean ? :)
hi @Gakhramanzode
or can you send example what you mean ? :)
I meant we are using v2.5 defaults (no override/config): https://grafana.com/docs/loki/v2.5.x/configuration/#limits_config
max_line_size: 0 max_line_szie_truncate: false
This are our promtail pods as today (one week); the "big leakers" are gone; they were apps with gigantic log lines, but the leaky pattern remains, I guess it is related to "very long lines" apps
sum by (node) (container_memory_working_set_bytes{job="cadvisor/cadvisor", pod=~"promtail.+",container!=""})
@grandich thank you! I try again tomorrow fix memory leak))
@grandich thank you for sharing your insights on the memory consumption issue with promtail. we have implemented your suggestion but decided to set max_line_size to 16384 instead of 0, to ensure better control over the log line size. all changes have been applied, and we will observe the system over the next few days to monitor the memory usage. I appreciate your help!
@grandich do you know why setting max_line_size: 0 helped to remove the leak? According to the doc, by setting this we essentially say there's no limit on the log line length. For me it's counter-intuitive.
@sajithnedunmullage so, I don't understand 😀 I set max_line_size to 16384. It's wrong?))
this is my changes
@Gakhramanzode I'm also confused. Seeking for insights and advice :)
My Promtail pods have been slowly, but steadily increasing their memory consumption for the last 30 days. Trying to find a proper fix for this.
@sajithnedunmullage I understand you bro 😄
@grandich do you know why setting max_line_size: 0 helped to remove the leak? According to the doc, by setting this we essentially say there's no limit on the log line length. For me it's counter-intuitive.
Hi @sajithnedunmullage I didn't said that. In https://github.com/grafana/loki/issues/8054#issuecomment-1888178237 I mentioned that the leak is correlated to the length of the lines. We had lines in the order of ~500KB / ~1MB which produced huge leaks. We introduced line breaks and in certain cases we eliminated such lines, and the leaks improved.
I only mentioned max_line_size as a reference.
In theory, if we set max_line_size to something != 0, the leak should improve, but we didn't test it.
I'm seeing the same behavior over a 90 day period:
@maudrid Welcome to the club 🤝
hello everyone guys 👋
@grandich I think your revision helped us
...
# -- Section for crafting Promtails config file. The only directly relevant value is `config.file`
# which is a templated string that references the other values and snippets below this key.
# @default -- See `values.yaml`
config:
...
# -- A section of reusable snippets that can be reference in `config.file`.
# Custom snippets may be added in order to reduce redundancy.
# This is especially helpful when multiple `kubernetes_sd_configs` are use which usually have large parts in common.
# @default -- See `values.yaml`
snippets:
...
# -- You can put here any keys that will be directly added to the config file's 'limits_config' block.
# @default -- empty
extraLimitsConfig: |
max_line_size: 0
max_line_size_truncate: false
...
This issue should be renamed to "Promtail: memory leak" as it was confirmed that this is really a memory leak. Also, a fix would be nice 🙏🏼.
it's work 🤔
Describe the bug We are seeing a slow but steady increase in memory usage for our Promtail pods. IMHO this looks very typical for a memory leak. Sooner or later we experience OOM kills for Promtail.
To Reproduce Steps to reproduce the behavior:
Expected behavior I would expect memory consumption to be somewhat linear under regular operation. Or, if there are spikes, that the memory is freed after increased demand.
Environment:
Screenshots, Promtail config, or terminal output
We are using the default values of the Helm Chart (with exception of the Loki URL and some podAnnotations and tolerations).
I'm also happy to investigate more on this and e.g. track down what is actually driving the memory consumption, if someone can point me in the right direction how to do this.