grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.31k stars 3.38k forks source link

Loki 3.0.0 doesnt support external Memcached Clusters #12559

Open elburnetto-intapp opened 5 months ago

elburnetto-intapp commented 5 months ago

Describe the bug When deploying Loki 3.0.0, it allows you to specify in Helm details of an external Memcached cluster:

  memcached:
    chunk_cache:
      enabled: true
      host: memcached.monitoring.svc.cluster.local
      service: "memcache"
      batch_size: 256
      parallelism: 10
    results_cache:
      enabled: true
      host: memcached.monitoring.svc.cluster.local
      service: "memcache"
      timeout: "500ms"
      default_validity: "12h"

However, when you look at the Loki Config Template, the only references for Memcache are if you use the Distributed Mode Memcaches:

    query_range:
      align_queries_with_step: true
      {{- with .Values.loki.query_range }}
      {{- tpl (. | toYaml) $ | nindent 4 }}
      {{- end }}
      {{- if .Values.resultsCache.enabled }}
      {{- with .Values.resultsCache }}
      cache_results: true
      results_cache:
        cache:
          default_validity: {{ .defaultValidity }}
          background:
            writeback_goroutines: {{ .writebackParallelism }}
            writeback_buffer: {{ .writebackBuffer }}
            writeback_size_limit: {{ .writebackSizeLimit }}
          memcached_client:
            consistent_hash: true
            addresses: dnssrvnoa+_memcached-client._tcp.{{ template "loki.fullname" $ }}-results-cache.{{ $.Release.Namespace }}.svc
            timeout: {{ .timeout }}
            update_interval: 1m
      {{- end }}
      {{- end }}

To Reproduce Steps to reproduce the behavior:

  1. Configure the Helm Chart to use external Memcached cluster
  2. Review the Loki Config Map (no Memcache details appear)
  3. Review the Loki Memcached Metrics exported via /metrics (they no longer increase)

Expected behavior We should be able to use our own Memcached clusters, and the Helm Chart should accommodate for this.

Environment:

rknightion commented 4 months ago

You can use external memcached with SSD mode on v3.

The key things we set (for elasticache):

loki:
  memcached:
    chunk_cache:
      enabled: false
    results_cache:
      enabled: false

  storage_config:
    index_queries_cache_config:
      memcached:
        batch_size: 1024
        parallelism: 100
      memcached_client:
        addresses: "loki-results.0001.use1.cache.amazonaws.com:11211,loki-results.0002.use1.cache.amazonaws.com:11211"
        timeout: 5000ms
        max_idle_conns: 64
        max_item_size: 0
        consistent_hash: true

  structuredConfig:
    query_range:
      align_queries_with_step: true
      cache_results: true
      max_retries: 30
      cache_index_stats_results: true
      cache_volume_results: true
      cache_instant_metric_results: true
      instant_metric_query_split_align: true
      cache_series_results: true
      cache_label_results: true
      results_cache:
        compression: snappy
        cache:
          background:
            writeback_buffer: 500000
            writeback_goroutines: 1
            writeback_size_limit: 500MB
          default_validity: 12h
          memcached:
            batch_size: 1024
            parallelism: 100
          memcached_client:
            addresses: "loki-results.0001.use1.cache.amazonaws.com:11211,loki-results.0002.use1.cache.amazonaws.com:11211"
            timeout: 5000ms
            max_idle_conns: 64
            max_item_size: 0
            consistent_hash: true

    chunk_store_config:
      chunk_cache_config:
        background:
          writeback_buffer: 500000
          writeback_goroutines: 1
          writeback_size_limit: 500MB
        default_validity: 0s
        memcached:
          batch_size: 1024
          parallelism: 100
        memcached_client:
          addresses: "loki-chunk.0001.use1.cache.amazonaws.com:11211,loki-chunk.use1.cache.amazonaws.com:11211,loki-chunk.0003.use1.cache.amazonaws.com:11211"
          timeout: 5000ms
          max_idle_conns: 64
          max_item_size: 0
          consistent_hash: true
      write_dedupe_cache_config:
        memcached:
          batch_size: 1024
          parallelism: 100
        memcached_client:
          addresses: "loki-results.0002.use1.cache.amazonaws.com:11211"
          timeout: 5000ms
          max_idle_conns: 64
          max_item_size: 0
          consistent_hash: true

So the positions of some of the config we moved to StructuredConfig to get it to work as well as

elburnetto-intapp commented 4 months ago

@rknightion The issue is more around the Helm chart not supporting External Memcached clusters (after we did some amending of the values file, we got it working with our external Memcached so all is well). Just flagging as it was a breaking change when we jumped up to V3, as it stopped using Memcached until we saw it on our metrics.

tkcontiant commented 2 months ago

I came up with the same conclusion.

Even though there are significant differences between the official Loki-cache guide https://grafana.com/docs/loki/latest/operations/caching/ and the default chart settings.

gybanez commented 1 month ago

We're seeing the same problem trying to upgrade from 2.8.x (chart 5.7.4 to 6.7.1)

KA-ROM commented 2 weeks ago

Yeah, also having this issue. Makes a safe upgrade tricky.

@elburnetto-intapp can you explain the changes you made when you say "amending of the values file"? How did you get around this?