Closed ZIRAKrezovic closed 8 months ago
Hi @ZIRAKrezovic, thanks for reporting this! I guess you're right, even in extreme cases like this, the default cache implementation shouldn't lead to OOM. However, there is always a tradeoff as the current implementation is simple, performant, and covers all the sane use cases. I'll see what I can do to improve it.
As a workaround, you can check a setting format.label.streamCache
that allows you to customize a cache implementation used for streams. You can either use advanced libs like Caffeine or omit caching completely.
Probably you know, but still, I have to notice, that your use case with too many unique streams goes strictly against Loki's best practices.
Hi @nehaev, I am aware of my bad practice and have already fixed it. I still chose to report the bug in case somebody else runs into it. A simple(st) fix would be to WARN the user when cache reaches certain threshold (i.e. 100, 1000, ...).
Some relevant experiments from the past: https://github.com/loki4j/loki-logback-appender/compare/main...limited-stream-cache
I have been debugging a memory issue and pinned it finally to loki-logback-encoder. Lets start with my analysis
My memory heap dump (relevant part only)
Loki appender related in: java.util.String
LogStream holds the most entries for HashMap usage, mainly the labels which are different on every send
Final holder is the UnboundAtomicMapCache instantiated in AbstractLoki4jEncoder.
Due to nature of trace id being virtually different in every log message, cache miss is guaranteed and creation of new node on every new log request
This cache will grow indefinitely and never free unused labels, resulting in application eventually dying with out of memory error with any heap limit and a certain amount of logs.
Finally, my logback-spring.xml snippet
Now, I had the issue where loki would reject the logs due to label difference, so I had otel-exporter strip the traceId field and could probably drop it here. But still, the memory issue should be fixed or users should be aware of it, at least. That's why I am creating this (more of an FYI) report.
I am using loki-logback-appender 1.4.2 with spring-boot 3.2.3