grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.83k stars 3.44k forks source link

SSD Helm install: bloom compactor error #12710

Closed mzupan closed 6 months ago

mzupan commented 6 months ago

Describe the bug I've tried to enable bloom filters on the SSD helm install. On one test cluster with not a lot of traffic it seems to be working. I made basically the same config changes and getting the following error

loki-backend-0 loki level=info ts=2024-04-20T22:16:49.870006554Z caller=ringmanager.go:203 msg="scheduler is ACTIVE in the ring"
loki-backend-0 loki level=info ts=2024-04-20T22:16:49.870185026Z caller=module_service.go:82 msg=starting module=query-scheduler
loki-backend-0 loki level=info ts=2024-04-20T22:16:49.958667369Z caller=compactor.go:428 msg="compactor is ACTIVE in the ring"
loki-backend-0 loki level=info ts=2024-04-20T22:16:49.958792611Z caller=loki.go:503 msg="Loki started" startup_time=1.70261613s
loki-backend-0 loki level=info ts=2024-04-20T22:16:50.024969026Z caller=bloomcompactor.go:458 component=bloom-compactor msg=compacting org_id=fake table=loki_index_19831 ownership=0000000000000000-07baa9c9ffffffff
loki-backend-0 loki panic: runtime error: invalid memory address or nil pointer dereference [recovered]
loki-backend-0 loki     panic: runtime error: invalid memory address or nil pointer dereference
loki-backend-0 loki [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x20a1b9a]
loki-backend-0 loki
loki-backend-0 loki goroutine 1161 [running]:
loki-backend-0 loki github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).buildGaps.OnceFunc.func4.1()
loki-backend-0 loki     /usr/local/go/src/sync/oncefunc.go:24 +0x6c
loki-backend-0 loki panic({0x25e7020?, 0x48d3ee0?})
loki-backend-0 loki     /usr/local/go/src/runtime/panic.go:914 +0x21f
loki-backend-0 loki github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).buildGaps.func2()
loki-backend-0 loki     /src/loki/pkg/bloomcompactor/controller.go:388 +0x1a
loki-backend-0 loki github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).buildGaps.OnceFunc.func4()
loki-backend-0 loki     /usr/local/go/src/sync/oncefunc.go:27 +0x6b
loki-backend-0 loki sync.(*Once).doSlow(0x18?, 0xc000e00000?)
loki-backend-0 loki     /usr/local/go/src/sync/once.go:74 +0xbf
loki-backend-0 loki sync.(*Once).Do(0xc000e00000?, 0x410965?)
loki-backend-0 loki     /usr/local/go/src/sync/once.go:65 +0x19
loki-backend-0 loki github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).buildGaps.OnceFunc.func5()
loki-backend-0 loki     /usr/local/go/src/sync/oncefunc.go:31 +0x2d
loki-backend-0 loki github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).buildGaps(0xc000cf3bc0, {0x3254dc0, 0xc000df7770}, {0xc0044231f7, 0x4}, {{0x2?}, {0xc000b53050?, 0x4d77?}}, {0x0, 0x7baa9c9ffffffff}, ...)
loki-backend-0 loki     /src/loki/pkg/bloomcompactor/controller.go:396 +0x19b4
loki-backend-0 loki github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).compactTenant(0xc000cf3bc0, {0x3254dc0, 0xc000df7770}, {{0x0?}, {0xc000b53050?, 0xffffffffffffffff?}}, {0xc0044231f7, 0x4}, {0x0, 0x7baa9c9ffffffff}, ...)
loki-backend-0 loki     /src/loki/pkg/bloomcompactor/controller.go:115 +0x99a
loki-backend-0 loki github.com/grafana/loki/v3/pkg/bloomcompactor.(*Compactor).compactTenantTable(0xc001de3180, {0x3254dc0, 0xc000df7770}, 0xc003d818c0, 0xc003db76c0?)
loki-backend-0 loki     /src/loki/pkg/bloomcompactor/bloomcompactor.go:460 +0x36b
loki-backend-0 loki github.com/grafana/loki/v3/pkg/bloomcompactor.(*Compactor).runWorkers.func2({0x3254dc0, 0xc000df7770}, 0xc0002ee360?)
loki-backend-0 loki     /src/loki/pkg/bloomcompactor/bloomcompactor.go:422 +0x148
loki-backend-0 loki github.com/grafana/dskit/concurrency.ForEachJob.func1()
loki-backend-0 loki     /src/loki/vendor/github.com/grafana/dskit/concurrency/runner.go:105 +0x83
loki-backend-0 loki golang.org/x/sync/errgroup.(*Group).Go.func1()
loki-backend-0 loki     /src/loki/vendor/golang.org/x/sync/errgroup/errgroup.go:78 +0x56
loki-backend-0 loki created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1112
loki-backend-0 loki     /src/loki/vendor/golang.org/x/sync/errgroup/errgroup.go:75 +0x96

To Reproduce

Parts of my config


  loki:
    auth_enabled: false

    commonConfig:
      replication_factor: 2

    server:
      http_listen_port: 3100
      grpc_listen_port: 9095
      grpc_server_max_recv_msg_size: 256000000
      grpc_server_max_send_msg_size: 256000000
      http_server_read_timeout: 15m
      http_server_write_timeout: 15m

    limits_config:
      cardinality_limit: 1200000
      max_entries_limit_per_query: 1000000
      ingestion_burst_size_mb: 2000
      ingestion_rate_mb: 1000
      ingestion_rate_strategy: local
      max_cache_freshness_per_query: 10m
      per_stream_rate_limit: 512MB
      per_stream_rate_limit_burst: 1024MB 
      max_label_names_per_series: 600
      max_global_streams_per_user: 100000
      reject_old_samples: false
      reject_old_samples_max_age: 268h
      split_queries_by_interval: 15m
      unordered_writes: true
      increment_duplicate_timestamp: true
      max_query_parallelism: 256
      query_timeout: 15m
      max_query_lookback: 90d
      max_label_value_length: 20480
      max_label_name_length: 10240
      shard_streams:
        enabled: true
      bloom_gateway_enable_filtering: true
      bloom_compactor_enable_compaction: true

    storage:
      bucketNames:
        chunks: loki-storage-mon
        ruler: loki-storage-mon
        admin: loki-storage-mon
      type: s3
      s3:
        region: us-east-1

    # -- Check https://grafana.com/docs/loki/latest/configuration/#schema_config for more info on how to configure schemas
    schemaConfig:
      configs:
        - from: 2022-01-11
          store: boltdb-shipper
          object_store: s3
          schema: v12
          index:
            prefix: loki_index_
            period: 24h
        - from: 2023-11-13
          store: tsdb
          object_store: s3
          schema: v12
          index:
            prefix: loki_index_
            period: 24h
        - from: 2024-04-19
          store: tsdb
          object_store: s3
          schema: v13
          index:
            prefix: loki_index_
            period: 24h
    # -- Check https://grafana.com/docs/loki/latest/configuration/#ruler for more info on configuring ruler
    rulerConfig: {}
    # -- Structured loki configuration, takes precedence over `loki.config`, `loki.schemaConfig`, `loki.storageConfig`
    structuredConfig:
      bloom_compactor:
        enabled: true

      bloom_gateway:
        enabled: true
        client:
          addresses: dns+loki-backend-headless.loki.svc.cluster.local:9095
    # -- Additional query scheduler config
    query_scheduler:
      max_outstanding_requests_per_tenant: 4096
    # -- Additional storage config
    storage_config:
      hedging:
        at: "250ms"
        max_per_second: 20
        up_to: 3
    # --  Optional compactor configuration
    compactor:
      compaction_interval: 5m
    # --  Optional analytics configuration
    analytics: {}
    # --  Optional querier configuration
    querier:
      max_concurrent: 16
    # --  Optional ingester configuration
    ingester:
      chunk_encoding: snappy
      chunk_idle_period: 10m
      chunk_target_size: 1500000
      concurrent_flushes: 96
      flush_check_period: 15s
      flush_op_timeout: 5m
      lifecycler:
        final_sleep: 0s
        ring:
          kvstore:
            store: memberlist
          replication_factor: 1
      max_chunk_age: 4h
      wal:
        enabled: false
    frontend:
      scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'
      log_queries_longer_than: 5s
      compress_responses: true
    frontend_worker:
      scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'
      grpc_client_config:
        max_send_msg_size: 256000000

  rbac:
    namespaced: true

  write:
    replicas: 3

  read:
    replicas: 3

  backend:
    replicas: 3

Expected behavior expect backend to run

Environment:

Screenshots, Promtail config, or terminal output If applicable, add any output to help explain your problem.

mzupan commented 6 months ago

looks like this is a dup of https://github.com/grafana/loki/issues/12540 so closing