grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.57k stars 3.41k forks source link

Error fetching from cache #10735

Open sivesh1989 opened 1 year ago

sivesh1989 commented 1 year ago

Hi,

 We are running Loki v2.9.1 with memcache (1.6.21) extstore enabled. Below is the config set for Memcache,

      - name: MEMCACHED_CACHE_SIZE
        value: "4000"
      - name: MEMCACHED_MAX_ITEM_SIZE
        value: 2m
      - name: ENABLE_EXT_STORE
        value: "true"
      - name: EXT_STORE_SIZE
        value: "1500G"

While performing read path greater than 2 hrs data we observe below error,

2023-09-25T06:46:45.523875277Z stderr F ts=2023-09-25T06:46:45.442137322Z caller=spanlogger.go:86 user=xxx level=warn msg="error fetching from cache" err="memcache: unexpected line in get response: \"SERVER_ERROR out of memory writing get response\r\n\""

Querier memory utilization is normal around 10%( Limits: 50GB) Memcache replica count 40 Querier Replica count 50

Regards, Sivesh Kumar R

saule1508 commented 11 months ago

We are experiencing the same error. I think it is because memcached has to assemble the response in memory from the data on disk but the memcached has not enough memory. If you do stats on the memcached there is a stat about extstore oom. We probably need to give more memory to memcached. I saw this in memcached discussion in discord, I can link it here when I find it back

waney316 commented 9 months ago

We are experiencing the same error. I think it is because memcached has to assemble the response in memory from the data on disk but the memcached has not enough memory. If you do stats on the memcached there is a stat about extstore oom. We probably need to give more memory to memcached. I saw this in memcached discussion in discord, I can link it here when I find it back

hi. Have you solved this problem?

saule1508 commented 9 months ago

yes, it is working fine. Make sure to read the wiki https://github.com/memcached/memcached/wiki/ConfiguringLokiExtstore, the maintainer of memcached did analysis and came up with recommendations. Especially the setting related to ext store made a big difference. Another take-away is that memcached with loki scales better horizontally (better a lot of small memcached than a single big)

waney316 commented 9 months ago

yes, it is working fine. Make sure to read the wiki https://github.com/memcached/memcached/wiki/ConfiguringLokiExtstore, the maintainer of memcached did analysis and came up with recommendations. Especially the setting related to ext store made a big difference. Another take-away is that memcached with loki scales better horizontally (better a lot of small memcached than a single big)

Nice, Thank you for your help. I'm a bit confused. memcache Extstore refers to memcachedFront or memcachedChunks ? My loki cluster is deployed by help loki distributed, and I set the memcache start option -m 6000 -I 2m -o ext_path=/disk/extstore:500G,ext_wbuf_size=32,ext_threads=10,ext_max_sleep=10000,slab_automove_freeratio=0.10,ext_recache_rate=0 , I encounter an error: 'Illegal suboption "(null)"'. Our daily log generation is around 6T+ and stored in S3, and the query efficiency is very low. It's quite problematic.

saule1508 commented 9 months ago

the extstore is especially usefull for the chunk cache, because it is big if you want to keep multiple days, so disk can be more cost effective then RAM. But you need decent disk performance, the ext_threads is all dependent on the disk speed I think. The query result cache is small and does not need an extstore. You should head on to the slack grafana/loki channel with your question. Once I am back in the office I can give you the command we use for memcached ( I surely don't have the error illegal suboption !)

waney316 commented 9 months ago

the extstore is especially usefull for the chunk cache, because it is big if you want to keep multiple days, so disk can be more cost effective then RAM. But you need decent disk performance, the ext_threads is all dependent on the disk speed I think. The query result cache is small and does not need an extstore. You should head on to the slack grafana/loki channel with your question. Once I am back in the office I can give you the command we use for memcached ( I surely don't have the error illegal suboption !)

Okay, I have already posted my question on the grafana/loki channel and look forward to your response. Thank you

pingping95 commented 7 months ago

@waney316 Are you solved this problem ?

I`m facing this same issue.

image