Open Arun-Trichy opened 1 year ago
Please note we have also tried disabling the in-memory FIFO cache
// applyFIFOCacheConfig turns on FIFO cache for the chunk store and for the query range results,
// but only if no other cache storage is configured (redis or memcache).
//
// This behavior is only applied for the chunk store cache and for the query range results cache
// (i.e: not applicable for the index queries cache or for the write dedupe cache).
func applyFIFOCacheConfig(r *ConfigWrapper) {
chunkCacheConfig := r.ChunkStoreConfig.ChunkCacheConfig
if !cache.IsCacheConfigured(chunkCacheConfig) {
r.ChunkStoreConfig.ChunkCacheConfig.EnableFifoCache = true
}
resultsCacheConfig := r.QueryRange.ResultsCacheConfig.CacheConfig
if !cache.IsCacheConfigured(resultsCacheConfig) {
r.QueryRange.ResultsCacheConfig.CacheConfig.EnableFifoCache = true
// The query results fifocache is still in Cortex so we couldn't change the flag defaults
// so instead we will override them here.
r.QueryRange.ResultsCacheConfig.CacheConfig.Fifocache.MaxSizeBytes = "1GB"
r.QueryRange.ResultsCacheConfig.CacheConfig.Fifocache.TTL = 1 * time.Hour
}
}
Thanks for sharing such a detailed configuration. According to your configuration, the increase in memory should be due to unexpected behavior of the cache. You can refer to my configuration and set a cache you expect for these two cache. Because of applyFIFOCacheConfig
function you will never be able to cancel these two caches. So setting a small memory will help you.
chunk_store_config:
chunk_cache_config:
async_cache_write_back_buffer_size: 1
default_validity: 5m
fifocache:
ttl: 5m
size: 0
max_size_bytes: 1GB
If none of this can help you reduce the memory, it is not caused by the cache. You can check the memory distribution through go pprof. You can refer to the method in one of my memory overflow issues.https://github.com/grafana/loki/issues/8831
Thanks @liguozhong for the response
I tried the same config and was able to control the memory shoot-up, it actually started coming down even without the limits set
Just one question though, what kind of impact/ performance drop do we have if we set the config as mentioned (we reduced the max_size_bytes as our overall POD limit is 1Gi for Loki)
chunk_store_config:
chunk_cache_config:
async_cache_write_back_buffer_size: 1
default_validity: 5m
fifocache:
ttl: 5m
max_size_items: 0
max_size_bytes: 200MB
Also, what happens at 1:30AM to 2:00AM in Loki, are there any set of predefined operations which is causing this pattern of spike in memory consumption. As the pattern is also there even with memory limits set
Describe the bug When we tried to understand the memory usage pattern for Loki, to reduce the resource footprint noticed a strange pattern of memory usage which seems to increase by almost 1.5Gi during an interval of 24hrs. Wanted to understand how we can control it, or is it an expected behavior? Please note we have also tried disabling the in-memory FIFO cache, but still memory keeps growing in the same way on daily basis. Also, for trying this experiment, we have disabled the Kubernetes pod limits set for Loki. And why do we see a huge difference between working set bytes and RSS memory usage?
To Reproduce Steps to reproduce the behavior:
Expected behavior Loki memory should stabilize after growing, and not keep increasing (seems like a memory Leak)
Environment:
Screenshots, Promtail config, or terminal output
Some additional graphs from Grafana: