grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.35k stars 3.38k forks source link

Loki compactor doesn't work at all #9990

Open vpetrushin opened 1 year ago

vpetrushin commented 1 year ago

I've tried to enable the compactor in loki-distributed rollout. Every 10m I have the compactor crash with panic

The compactor crashes with retention_enabled: true and false. Below is the config example with retention_enabled: true.

2023-07-20 09:38:28.256 | level=info ts=2023-07-20T06:38:28.157608192Z caller=main.go:108 msg="Starting Loki" version="(version=2.8.2, branch=HEAD, revision=9f809eda7)" |  
-- | -- | --
  |   | 2023-07-20 09:38:27.273 | /src/loki/pkg/storage/stores/indexshipper/compactor/compactor.go:469 +0x1b3 |  
  |   | 2023-07-20 09:38:27.273 | created by github.com/grafana/loki/pkg/storage/stores/indexshipper/compactor.(*Compactor).runCompactions |  
  |   | 2023-07-20 09:38:27.273 | /src/loki/pkg/storage/stores/indexshipper/compactor/compactor.go:471 +0x9e |  
  |   | 2023-07-20 09:38:27.273 | github.com/grafana/loki/pkg/storage/stores/indexshipper/compactor.(*Compactor).runCompactions.func3() |  
  |   | 2023-07-20 09:38:27.273 | /src/loki/pkg/storage/stores/indexshipper/compactor/compactor.go:458 +0x176 |  
  |   | 2023-07-20 09:38:27.273 | github.com/grafana/loki/pkg/storage/stores/indexshipper/compactor.(*Compactor).runCompactions.func2() |  
  |   | 2023-07-20 09:38:27.273 | /src/loki/pkg/storage/stores/indexshipper/compactor/compactor.go:578 +0x1f3 |  
  |   | 2023-07-20 09:38:27.273 | github.com/grafana/loki/pkg/storage/stores/indexshipper/compactor.(*Compactor).RunCompaction(0xc000600400, {0x29f2fc8?, 0xc0000aed70}, 0x0) |  
  |   | 2023-07-20 09:38:27.273 | /src/loki/pkg/storage/stores/indexshipper/compactor/compactor.go:727 +0xc6 |  
  |   | 2023-07-20 09:38:27.273 | github.com/grafana/loki/pkg/storage/stores/indexshipper/compactor.sortTablesByRange({0xc00067d000, 0x172, 0x172}) |  
  |   | 2023-07-20 09:38:27.273 | /src/loki/pkg/storage/stores/indexshipper/compactor/retention/util.go:50 +0x9d |  
  |   | 2023-07-20 09:38:27.273 | github.com/grafana/loki/pkg/storage/stores/indexshipper/compactor/retention.ExtractIntervalFromTableName({0xc001c581c6, 0x4}) |  
  |   | 2023-07-20 09:38:27.273 | goroutine 9248 [running]: |  
  |   | 2023-07-20 09:38:27.273 |   |  
  |   | 2023-07-20 09:38:27.273 | panic: runtime error: slice bounds out of range [-1:]

Config

  config.yaml: |
    auth_enabled: false
    chunk_store_config:
      chunk_cache_config:
        background:
          writeback_goroutines: 100
        embedded_cache:
          enabled: true
          ttl: 24h
        redis:
          endpoint: redis-chunks-master:6379
          idle_timeout: 2h
          password: XXX
          timeout: 2000ms
      write_dedupe_cache_config:
        background:
          writeback_goroutines: 100
        embedded_cache:
          enabled: true
          ttl: 24h
        redis:
          endpoint: redis-chunks-master:6379
          idle_timeout: 2h
          password: XXXX
          timeout: 2000ms
    common:
      compactor_address: http://loki2-loki-distributed-compactor:3100
      storage:
        s3:
          access_key_id: XXXX
          endpoint: https://s3.region.amazonaws.com
          insecure: false
          region: XXX
          s3: s3://XXX
          s3forcepathstyle: true
          secret_access_key: XXXX
    compactor:
      compaction_interval: 10m
      retention_delete_delay: 1h
      retention_delete_worker_count: 150
      retention_enabled: true
      shared_store: aws
      shared_store_key_prefix: index/
    distributor:
      ring:
        kvstore:
          store: memberlist
    frontend:
      compress_responses: true
      log_queries_longer_than: 10s
      tail_proxy_url: http://loki2-loki-distributed-querier:3100
    frontend_worker:
      frontend_address: loki2-loki-distributed-query-frontend-headless:9095
    ingester:
      chunk_block_size: 262144
      chunk_encoding: snappy
      chunk_idle_period: 30m
      chunk_retain_period: 1m
      lifecycler:
        ring:
          kvstore:
            store: memberlist
          replication_factor: 1
      max_transfer_retries: 0
      wal:
        dir: /var/loki/wal
    limits_config:
      enforce_metric_name: false
      max_cache_freshness_per_query: 30m
      max_entries_limit_per_query: 10000
      max_global_streams_per_user: 100000
      max_query_parallelism: 512
      max_query_series: 10000
      query_timeout: 3m
      reject_old_samples: true
      reject_old_samples_max_age: 168h
      retention_period: 1000h
      split_queries_by_interval: 2h
    memberlist:
      join_members:
      - loki2-loki-distributed-memberlist
    query_range:
      align_queries_with_step: true
      cache_results: true
      max_retries: 5
      results_cache:
        cache:
          background:
            writeback_goroutines: 100
          embedded_cache:
            enabled: true
            ttl: 24h
          redis:
            endpoint: redis-queries-master:6379
            idle_timeout: 2h
            password: XXXX
            timeout: 2000ms
    ruler:
      alertmanager_url: http://vmalertmanager.monitoring-system.svc.cluster.local:9093
      external_url: http://vmalertmanager.monitoring-system.svc.cluster.local:9093
      ring:
        kvstore:
          store: memberlist
      rule_path: /tmp/loki/scratch
      storage:
        s3:
          access_key_id: XXXX
          endpoint: https://s3.region.amazonaws.com
          insecure: false
          region: myregion
          s3: s3://URL
          s3forcepathstyle: true
          secret_access_key: XXX
        type: s3
    runtime_config:
      file: /var/loki-distributed-runtime/runtime.yaml
    schema_config:
      configs:
      - from: "2022-01-11"
        index:
          period: 24h
          prefix: loki_index_
        object_store: aws
        schema: v12
        store: boltdb-shipper
    server:
      grpc_listen_port: 9095
      grpc_server_max_concurrent_streams: 1000
      grpc_server_max_recv_msg_size: 16777216
      grpc_server_max_send_msg_size: 16777216
      http_listen_port: 3100
    storage_config:
      aws:
        access_key_id: XXX
        endpoint: https://s3.region.amazonaws.com
        insecure: false
        region: region
        s3: s3://URL
        s3forcepathstyle: true
        secret_access_key: XXXX
      boltdb_shipper:
        active_index_directory: /var/loki/index
        cache_location: /var/loki/cache
        cache_ttl: 168h
        shared_store: aws
      filesystem:
        directory: /var/loki/chunks
      index_queries_cache_config:
        background:
          writeback_goroutines: 100
        embedded_cache:
          enabled: true
          ttl: 24h
        redis:
          endpoint: redis-queries-master:6379
          idle_timeout: 2h
          password: XXXXX
          timeout: 2000ms
    table_manager:
      retention_deletes_enabled: true

Chart version: 0.69.16 Loki version: 2.8.2

To Reproduce Steps to reproduce the behavior:

  1. Started Loki 2.8.2
  2. Started Promtail 2.8.2
  3. wait 10m for compactor to start doing its job.

Expected behavior Compactor doing its job calmly w/o panic :)

Environment:

mac133k commented 1 year ago

I think the problem may be with this line in your compactor config: shared_store: aws The documentation does not mention aws as a value for this parameter:

# The shared store used for storing boltdb files. Supported types: gcs, s3,
# azure, swift, filesystem, bos.

Either change it to s3 or remove, so it defaults to what you have in common.storage.

lianghuiyuan commented 5 months ago

the same. compactor doesn't work with follow config :

compactor:    
  working_directory: /loki/compactor
  shared_store: filesystem
  compaction_interval: 10m
  retention_enabled: true 
  retention_delete_delay: 2h
  retention_delete_worker_count: 150  
sunidhi271 commented 3 months ago

I am facing same issue. I am using minio as storage and I am passing s3 in "delete_request_store" parameter. It should be set to configure the store for delete requests.

I have configured everything as per this document: https://grafana.com/docs/loki/latest/operations/storage/retention/

Below is my configuration:

config.yaml: |

    auth_enabled: false
    common:
      compactor_address: 'http://loki-backend:3100'
      path_prefix: /var/loki
      replication_factor: 3
      storage:
        s3:
          access_key_id: enterprise-logs
          bucketnames: chunks
          endpoint: loki-minio.monitoring.svc:9000
          insecure: true
          s3forcepathstyle: true
          secret_access_key: supersecret
    compactor:
      compaction_interval: 10m
      delete_request_store: s3
      retention_delete_delay: 5m
      retention_delete_worker_count: 10
      retention_enabled: true
      working_directory: /data/retention
    frontend:
      scheduler_address: ""
      tail_proxy_url: http://loki-querier.monitoring.svc.cluster.local:3100
    frontend_worker:
      scheduler_address: ""
    index_gateway:
      mode: simple
    ingester:
      chunk_encoding: snappy
    limits_config:
      max_cache_freshness_per_query: 10m
      query_timeout: 300s
      reject_old_samples: true
      reject_old_samples_max_age: 168h
      split_queries_by_interval: 15m
      volume_enabled: true
    memberlist:
      join_members:
      - loki-memberlist
    pattern_ingester:
      enabled: false
    querier:
      max_concurrent: 4
    query_range:
      align_queries_with_step: true
    ruler:
      storage:
        s3:
          bucketnames: ruler
        type: s3
    runtime_config:
      file: /etc/loki/runtime-config/runtime-config.yaml
    schema_config:
      configs:
      - from: "2024-04-01"
        index:
          period: 24h
          prefix: loki_index_
        object_store: s3
        schema: v13
        store: tsdb
    server:
      grpc_listen_port: 9095
      http_listen_port: 3100
      http_server_read_timeout: 600s
      http_server_write_timeout: 600s
    storage_config:
      boltdb_shipper:
        index_gateway_client:
          server_address: dns+loki-backend-headless.monitoring.svc.cluster.local:9095
      hedging:
        at: 250ms
        max_per_second: 20
        up_to: 3
      tsdb_shipper:
        index_gateway_client:
          server_address: dns+loki-backend-headless.monitoring.svc.cluster.local:9095
    tracing:
      enabled: true

But I don't see any hint of compaction process in the grafana: image

aniketwdubey commented 4 days ago

@sunidhi271 try setting the working directory to something like /var/loki/compactor