grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.85k stars 3.44k forks source link

"Error loading cache generation numbers" came out on Loki-read #13756

Open duj4 opened 3 months ago

duj4 commented 3 months ago

Describe the bug I am running Loki 3.1.0 in SSD mode with retention_enabled as true, but when the stack is up and running for a while, loki-read pod starts complaining error as below: image

Per the link https://grafana.com/docs/loki/latest/operations/troubleshooting/#cache-generation-errors, I found the metrics of loki_delete_cache_gen_load_failures_total is larger than 1 and it requires to set allow_deletes as true, but this flag has been marked as deprecated in current version and as the substitution, deletion_mode is set as filter-and-delete already.

compactor: image

limits: image

If deletion_mode has been set in limits_config, do I have to set it again in runtime_config for each tenant? If allow_deletes has been marked as deprecated, do I need to set it as true still?

To Reproduce Steps to reproduce the behavior:

  1. Start Loki in SSD mode with retention enabled
  2. Wait for a while and check the log files of Loki-read pod

Expected behavior There should be no error post out if filter-and-delete is set correctly.

Environment:

Hitesh-Agrawal commented 3 months ago

Am too facing this error, using grafana/loki helm-chart 6.8.0 with app version 3.1.0

level=error ts=2024-08-08T05:23:38.230753658Z caller=http.go:107 msg="error getting delete requests from the store" err="unexpected status code: 404" ts=2024-08-08T05:23:38.230776322Z caller=spanlogger.go:109 user=fake level=error msg="failed loading deletes for user" err="unexpected status code: 404"

The loki config is below `auth_enabled: false chunk_store_config: chunk_cache_config: embedded_cache: enabled: false memcached: batch_size: 100 expiration: 30m parallelism: 100 memcached_client: consistent_hash: true host: memcached-chunk.loki.svc.cluster.local service: memcached-chunk write_dedupe_cache_config: memcached: batch_size: 100 expiration: 30m parallelism: 100 memcached_client: consistent_hash: true host: memcached-write.loki.svc.cluster.local service: memcached-write common: compactor_address: http://loki-read:3100 path_prefix: /var/loki replication_factor: 1 ring: kvstore: store: memberlist storage: s3: bucketnames: loki-data insecure: false region: eu-central-1 s3forcepathstyle: false compactor: delete_request_store: s3 retention_enabled: true frontend: compress_responses: true log_queries_longer_than: 20s max_outstanding_per_tenant: 4096 frontend_worker: grpc_client_config: max_recv_msg_size: 50331648 max_send_msg_size: 50331648 ingester: chunk_encoding: snappy chunk_idle_period: 15m chunk_retain_period: 30s chunk_target_size: 1572864 max_chunk_age: 1h ingester_client: grpc_client_config: grpc_compression: snappy max_recv_msg_size: 50331648 max_send_msg_size: 50331648 limits_config: ingestion_burst_size_mb: 1000 ingestion_rate_mb: 1000 max_cache_freshness_per_query: 10m max_query_parallelism: 2 max_query_series: 2000 per_stream_rate_limit: 20MB per_stream_rate_limit_burst: 20MB query_timeout: 2m reject_old_samples: true reject_old_samples_max_age: 168h retention_period: 8760h split_queries_by_interval: 15m deletion_mode: filter-and-delete memberlist: join_members:

duj4 commented 3 months ago

@Hitesh-Agrawal it seems that your error is different from mine, which mode are you using to deploy your Loki stack?

Hitesh-Agrawal commented 2 months ago

@duj4 I am running it in deploymentMode: SimpleScalable. The storage is in aws s3.

duj4 commented 2 months ago

ok, if that is the case, I think you may need to modify your compactor address per https://github.com/grafana/loki/blob/9315b3d03d790506cf8e69fb7407b476de9d0ed6/production/helm/loki/templates/_helpers.tpl#L1000

Hitesh-Agrawal commented 2 months ago

@duj4 The compactor address is already set as per the need , I am not using any backend targets and only have loki-read and loki-write pods with loki-gateway common: compactor_address: http://loki-read:3100/

duj4 commented 2 months ago

@Hitesh-Agrawal ok, so it is a mixed config of SSD and distributed LOL, which is out of my knowledge, sorry man

JStickler commented 1 week ago

Configuration questions have a better chance of being answered if you ask them on the community forums.