grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.49k stars 3.4k forks source link

"counter cannot decrease in value" panic when bloom filtering is applied #13300

Open tredman opened 3 months ago

tredman commented 3 months ago

Describe the bug

In the index-gateway, bloomquerier.FilterChunkRefs appears to panic because more "postFilter" chunks are returned than "preFiltered" chunks. The actual panic is in the prometheus counter.Add call, which panics if the value passed to it is less than 0.

With debug logging enabled, I am able to see that preFilterChunks is sometimes smaller than postFilterChunks. Glancing at the code, the panic occurs when filteredChunks is computed and the value is < 0 and added to the prometheus counter. Here are some examples of FilterChunkRefs calls that appear to return < 0 filteredChunks values.

ts=2024-06-24T21:26:03.76527129Z caller=spanlogger.go:109 component=index-gateway method=bloomquerier.FilterChunkRefs user=fake level=debug tenant=fake from=2024-06-22T21:45:00Z through=2024-06-22T22:00:00Z responses=2 preFilterChunks=7
2 postFilterChunks=78 skippedChunks=0 filteredChunks=-6 preFilterSeries=40 postFilterSeries=40 skippedSeries=0 filteredSeries=0
ts=2024-06-24T21:26:03.795554211Z caller=spanlogger.go:109 component=index-gateway method=bloomquerier.FilterChunkRefs user=fake level=debug tenant=fake from=2024-06-22T21:30:00Z through=2024-06-22T21:45:00Z responses=2 preFilterChunks=
89 postFilterChunks=90 skippedChunks=64 filteredChunks=-1 preFilterSeries=32 postFilterSeries=32 skippedSeries=10 filteredSeries=0
ts=2024-06-24T21:26:04.300281074Z caller=spanlogger.go:109 component=index-gateway method=bloomquerier.FilterChunkRefs user=fake level=debug tenant=fake from=2024-06-22T21:29:00Z through=2024-06-22T21:44:00Z responses=2 preFilterChunks=
80 postFilterChunks=81 skippedChunks=2 filteredChunks=-1 preFilterSeries=49 postFilterSeries=49 skippedSeries=2 filteredSeries=0

This causes the query to fail but doesn't occur consistently.

To Reproduce

We're running the latest pre-release build for 3.1.0: k208-ede6941 - was also able to reproduce this issue in the last release k207.

Here's a query we're running that triggers this. It only occurs when we're searching time periods that are covered by bloom filters - so most recent data doesn't seem to trigger the issue, but if I run a query from now-48h to now-47h I can repro this.

{cluster=~".+dev.+", cluster!="omitted"} |= "7n5GcjXJg2iMof2Xrw"

Expected behavior I would expect this query to run reliably, leveraging the bloom filters to filter chunks that aren't needed in the search.

Environment:

Screenshots, Promtail config, or terminal output

image

panic: counter cannot decrease in value
goroutine 26869 [running]:
github.com/grafana/loki/v3/pkg/util/server.onPanic({0x20887c0, 0x40138800a0})
    /src/loki/pkg/util/server/recovery.go:57 +0x48
github.com/grafana/loki/v3/pkg/util/server.init.WithRecoveryHandler.func4.1({0x40108d88a8?, 0x13b77f0?}, {0x20887c0?, 0x40138800a0?})
    /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/options.go:33 +0x34
github.com/grpc-ecosystem/go-grpc-middleware/recovery.recoverFrom({0x2f40c98?, 0x400722c510?}, {0x20887c0?, 0x40138800a0?}, 0x40108d8918?)
    /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:61 +0x38
github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1.1()
    /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:29 +0x78
panic({0x20887c0?, 0x40138800a0?})
    /usr/local/go/src/runtime/panic.go:770 +0x124
github.com/prometheus/client_golang/prometheus.(*counter).Add(0x4011f5f630?, 0x4005272f00?)
    /src/loki/vendor/github.com/prometheus/client_golang/prometheus/counter.go:128 +0xfc
github.com/grafana/loki/v3/pkg/bloomgateway.(*BloomQuerier).FilterChunkRefs(0x4000c1d4a0, {0x2f40c98, 0x400722c510}, {0x2578f2b, 0x4}, 0x19041da3d60, 0x19041e7f900, {0x4007326008, 0x33b, 0x3ff}, ...)
    /src/loki/pkg/bloomgateway/querier.go:216 +0x14ac
github.com/grafana/loki/v3/pkg/indexgateway.(*Gateway).GetChunkRef(0x4000b6c848, {0x2f40c98, 0x400722c510}, 0x400b284fa0)
    /src/loki/pkg/indexgateway/gateway.go:254 +0x270
github.com/grafana/loki/v3/pkg/logproto._IndexGateway_GetChunkRef_Handler.func1({0x2f40c98?, 0x400722c510?}, {0x247a9c0?, 0x400b284fa0?})
    /src/loki/pkg/logproto/indexgateway.pb.go:754 +0xd0
github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1({0x2f40c98?, 0x400722c510?}, {0x247a9c0?, 0x400b284fa0?}, 0x40108d9048?, 0x1da78c0?)
    /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:33 +0x98
google.golang.org/grpc.getChainUnaryHandler.func1({0x2f40c98, 0x400722c510}, {0x247a9c0, 0x400b284fa0})
    /src/loki/vendor/google.golang.org/grpc/server.go:1203 +0xa0
github.com/grafana/loki/v3/pkg/util/fakeauth.init.func2({0x2f40c98, 0x400722c4e0}, {0x247a9c0, 0x400b284fa0}, 0x400a302340?, 0x4007220e00)
    /src/loki/pkg/util/fakeauth/fake_auth.go:60 +0x80
google.golang.org/grpc.getChainUnaryHandler.func1({0x2f40c98, 0x400722c4e0}, {0x247a9c0, 0x400b284fa0})
    /src/loki/vendor/google.golang.org/grpc/server.go:1203 +0xa0
github.com/grafana/dskit/middleware.UnaryServerInstrumentInterceptor.func1({0x2f40c98, 0x400722c4e0}, {0x247a9c0, 0x400b284fa0}, 0x400a302340, 0x4007220dc0)
    /src/loki/vendor/github.com/grafana/dskit/middleware/grpc_instrumentation.go:46 +0x8c
google.golang.org/grpc.getChainUnaryHandler.func1({0x2f40c98, 0x400722c4e0}, {0x247a9c0, 0x400b284fa0})
    /src/loki/vendor/google.golang.org/grpc/server.go:1203 +0xa0
github.com/grafana/dskit/server.newServer.HTTPGRPCTracingInterceptor.func3({0x2f40c98?, 0x400722c4e0?}, {0x247a9c0?, 0x400b284fa0?}, 0x400a302340?, 0x400f42bba8?)
    /src/loki/vendor/github.com/grafana/dskit/middleware/http_tracing.go:74 +0x724
google.golang.org/grpc.getChainUnaryHandler.func1({0x2f40c98, 0x400722c4e0}, {0x247a9c0, 0x400b284fa0})
    /src/loki/vendor/google.golang.org/grpc/server.go:1203 +0xa0
github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1({0x2f40c98, 0x400722c330}, {0x247a9c0, 0x400b284fa0}, 0x400a302340, 0x4007220d40)
    /src/loki/vendor/github.com/opentracing-contrib/go-grpc/server.go:57 +0x2e8
google.golang.org/grpc.getChainUnaryHandler.func1({0x2f40c98, 0x400722c330}, {0x247a9c0, 0x400b284fa0})
    /src/loki/vendor/google.golang.org/grpc/server.go:1203 +0xa0
github.com/grafana/dskit/middleware.GRPCServerLog.UnaryServerInterceptor({{0x2f1ce00?, 0x4000dfbea0?}, 0x1?, 0x8f?}, {0x2f40c98, 0x400722c330}, {0x247a9c0, 0x400b284fa0}, 0x400a302340, 0x4007220d00)
    /src/loki/vendor/github.com/grafana/dskit/middleware/grpc_logging.go:54 +0x80
google.golang.org/grpc.NewServer.chainUnaryServerInterceptors.chainUnaryInterceptors.func1({0x2f40c98, 0x400722c330}, {0x247a9c0, 0x400b284fa0}, 0x400a302340, 0x245c100?)
    /src/loki/vendor/google.golang.org/grpc/server.go:1194 +0x88
github.com/grafana/loki/v3/pkg/logproto._IndexGateway_GetChunkRef_Handler({0x245dec0, 0x4000b6c848}, {0x2f40c98, 0x400722c330}, 0x4012abed80, 0x40009351c0)
    /src/loki/pkg/logproto/indexgateway.pb.go:756 +0x148
google.golang.org/grpc.(*Server).processUnaryRPC(0x4000ab4e00, {0x2f40c98, 0x400722c240}, {0x2f5cb00, 0x400019e000}, 0x4007b5be60, 0x4003807620, 0x4663bc0, 0x0)
    /src/loki/vendor/google.golang.org/grpc/server.go:1386 +0xb58
google.golang.org/grpc.(*Server).handleStream(0x4000ab4e00, {0x2f5cb00, 0x400019e000}, 0x4007b5be60)
    /src/loki/vendor/google.golang.org/grpc/server.go:1797 +0xb10
google.golang.org/grpc.(*Server).serveStreams.func2.1()
    /src/loki/vendor/google.golang.org/grpc/server.go:1027 +0x8c
created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 1039
    /src/loki/vendor/google.golang.org/grpc/server.go:1038 +0x13c

here is our loki config for reference:

    auth_enabled: false    
    bloom_compactor:
      ring:
        num_tokens: 50
      enabled: true
      worker_parallelism: 20
      max_compaction_parallelism: 2
      min_table_offset: 1
      max_table_offset: 7
      retention:
        enabled: true
        max_lookback_days: 7
    bloom_gateway:
      enabled: true
      worker_concurrency: 16
      block_query_concurrency: 32
      client:
        results_cache:
          compression: snappy
          cache:
            default_validity: 1h
            memcached:
              expiration: 1h
            memcached_client:
              addresses: dnssrvnoa+_memcached-client._tcp.loki-prototype-chunks-cache.observability.svc
        cache_results: true
        addresses: dnssrvnoa+_grpc._tcp.loki-prototype-bloom-gateway-headless.observability.svc.cluster.local
    chunk_store_config:
      chunk_cache_config:
        background:
          writeback_buffer: 500000
          writeback_goroutines: 1
          writeback_size_limit: 500MB
        default_validity: 1h
        memcached:
          batch_size: 4
          parallelism: 5
        memcached_client:
          addresses: dnssrvnoa+_memcached-client._tcp.loki-prototype-chunks-cache.observability.svc
          consistent_hash: true
          max_idle_conns: 72
          timeout: 2000ms
    internal_server:
      http_server_read_timeout: 10m0s
      http_server_write_timeout: 10m0s
      http_server_idle_timeout: 10m0s
    common:
      compactor_address: 'http://loki-prototype-compactor:3100'
      path_prefix: /var/loki
      replication_factor: 3
      storage:
        s3:
          bucketnames: omitted
          insecure: false
          region: us-east-1
          s3forcepathstyle: false
    compactor:
      delete_request_cancel_period: 1h
      delete_request_store: s3
      retention_delete_delay: 2h
      retention_delete_worker_count: 150
      retention_enabled: true
    distributor:
      otlp_config:
        default_resource_attributes_as_index_labels:
        - service
        - namespace
        - cluster
        - deployment
        - container
    frontend:
      scheduler_address: loki-prototype-query-scheduler.observability.svc.cluster.local:9095
      tail_proxy_url: http://loki-prototype-querier.observability.svc.cluster.local:3100
    frontend_worker:
      scheduler_address: loki-prototype-query-scheduler.observability.svc.cluster.local:9095
    index_gateway:
      mode: simple
    ingester:
      chunk_encoding: snappy
      # chunk_target_size: 3145728    # Default is 1572864
      # max_chunk_age: 30m
      wal:
        enabled: false
    limits_config:
      allow_structured_metadata: true
      bloom_compactor_enable_compaction: true
      bloom_gateway_enable_filtering: true
      bloom_gateway_shard_size: 3
      bloom_ngram_length: 4
      ingestion_rate_mb: 2500
      max_cache_freshness_per_query: 10m
      max_concurrent_tail_requests: 200
      max_global_streams_per_user: 100000
      query_timeout: 300s
      reject_old_samples: true
      reject_old_samples_max_age: 168h
      retention_period: 7d
      split_queries_by_interval: 15m
      volume_enabled: true
      shard_streams:
        enabled: true
        logging_enabled: true
        desired_rate: 3072KB
    memberlist:
      join_members:
      - loki-memberlist
    pattern_ingester:
      enabled: false
    querier:
      max_concurrent: 32
    query_range:
      align_queries_with_step: true
      cache_results: true
      results_cache:
        cache:
          background:
            writeback_buffer: 500000
            writeback_goroutines: 1
            writeback_size_limit: 500MB
          default_validity: 12h
          memcached_client:
            addresses: dnssrvnoa+_memcached-client._tcp.loki-prototype-results-cache.observability.svc
            consistent_hash: true
            timeout: 500ms
            update_interval: 1m
    ruler:
      storage:
        s3:
          bucketnames: omitted
          insecure: false
          region: us-east-1
          s3forcepathstyle: false
        type: s3
    runtime_config:
      file: /etc/loki/runtime-config/runtime-config.yaml
    schema_config:
      configs:
      - from: "2024-04-01"
        index:
          period: 24h
          prefix: loki_index_
        object_store: s3
        schema: v13
        store: tsdb
    server:
      grpc_listen_port: 9095
      http_listen_port: 3100
      http_server_idle_timeout: 10m0s
      http_server_read_timeout: 600s
      http_server_write_timeout: 600s
    storage_config:
      max_parallel_get_chunk: 300
      boltdb_shipper:
        index_gateway_client:
          server_address: dns+loki-prototype-index-gateway-headless.observability.svc.cluster.local:9095
      hedging:
        at: 250ms
        max_per_second: 20
        up_to: 3
      tsdb_shipper:
        index_gateway_client:
          server_address: dns+loki-prototype-index-gateway-headless.observability.svc.cluster.local:9095
      bloom_shipper:
        blocks_cache:
          soft_limit: 768GiB
          hard_limit: 896GiB
    tracing:
      enabled: false
      profiling_enabled: false
tredman commented 3 months ago

I fired up tracing and looked into a few queries with this issue. The traces are pretty huge but eyeballing it, this error consistently occurs in FilterChunkRefs calls that have at least one resultsCache hit.

image

This lead me to run an experiment of setting:

bloom_gateway:
  client:
    cache_results: false

And I am no longer able to reproduce this error. So now my next question is - is this a bug, or do I simply have the bloom gateway results cache misconfigured? The Loki chart does not/did not have an obvious way to configure the bloom_gateway or bloom_compactor sections of the loki config, so we had to set this up by hand and re-used the results cache that the queriers use. Is it possible these need to be separate caches? The chart does not spin up a separate bloom gateway results cache (that I can see) - so I'll try to hack that together and report back.

tredman commented 3 months ago

I noticed we had configured the results cache to the bloom gateway to point at the chunks cache, so I updated that.

    bloom_gateway:
      client:
            cache_results: true
            memcached_client:
              addresses: dnssrvnoa+_memcached-client._tcp.loki-prototype-chunks-cache.observability.svc

to

    bloom_gateway:
      client:
            cache_results: true
            memcached_client:
              addresses: dnssrvnoa+_memcached-client._tcp.loki-prototype-results-cache.observability.svc

The error came back, so I started up an entirely different results cache (basically just cloned the results-cache statefulset with a new name/set of matching labels)

    bloom_gateway:
      client:
            cache_results: true
            memcached_client:
              addresses: dnssrvnoa+_memcached-client._tcp.loki-prototype-bloom-results-cache.observability.svc

and am still getting the above error. So it looks like this is somehow related to having results cache enabled but it's not clear if it's a config problem or a bug. I've turned off bloom gateway results caching for now.

hamishforbes commented 2 months ago

I'm getting the same issue, on 3.1.0. But I'm also seeing a panic in the bloom-gateway at the same time as the index-gateway. With a slightly different error:

Full trace ``` ts=2024-07-08T02:40:26.842305071Z caller=spanlogger.go:109 component=bloom-gateway org_id=core traceID=1914b170f6adab10 user=core level=info msg=stats-report status=success tasks=1 filters=1 blocks_processed=1 series_requested=2 series_filtered=1 chunks_requested=4 chunks_filtered=2 chunks_remaining=2 filter_ratio=0.5 queue_time=63.665µs metas_fetch_time=0s blocks_fetch_time=49.302µs processing_time=6.55337ms post_processing_time=3.997µs duration=6.670334ms panic: runtime error: index out of range [0] with length 0 goroutine 5442 [running]: github.com/grafana/loki/v3/pkg/util/server.onPanic({0x2957a60, 0xc0028dbcc8}) /src/loki/pkg/util/server/recovery.go:57 +0x4d github.com/grafana/loki/v3/pkg/util/server.init.WithRecoveryHandler.func4.1({0x0?, 0x0?}, {0x2957a60?, 0xc0028dbcc8?}) /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/options.go:33 +0x27 github.com/grpc-ecosystem/go-grpc-middleware/recovery.recoverFrom({0x34f5570?, 0xc001f7c090?}, {0x2957a60?, 0xc0028dbcc8?}, 0xc003c0c600?) /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:61 +0x30 github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1.1() /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:29 +0x75 panic({0x2957a60?, 0xc0028dbcc8?}) /usr/local/go/src/runtime/panic.go:770 +0x132 github.com/grafana/loki/v3/pkg/bloomgateway.(*Gateway).FilterChunkRefs(0xc0005f0408, {0x34f5570, 0xc001f7c090}, 0xc00217a4b0) /src/loki/pkg/bloomgateway/bloomgateway.go:240 +0x16e7 github.com/grafana/loki/v3/pkg/logproto._BloomGateway_FilterChunkRefs_Handler.func1({0x34f5570?, 0xc001f7c090?}, {0x2a8eb20?, 0xc00217a4b0?}) /src/loki/pkg/logproto/bloomgateway.pb.go:546 +0xcb github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1({0x34f5570?, 0xc001f7c090?}, {0x2a8eb20?, 0xc00217a4b0?}, 0x34f5570?, 0xc001f7c060?) /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:33 +0xb0 google.golang.org/grpc.getChainUnaryHandler.func1({0x34f5570, 0xc001f7c090}, {0x2a8eb20, 0xc00217a4b0}) /src/loki/vendor/google.golang.org/grpc/server.go:1203 +0xb2 github.com/grafana/dskit/middleware.ServerUserHeaderInterceptor({0x34f5570?, 0xc001f7c060?}, {0x2a8eb20, 0xc00217a4b0}, 0x40?, 0xc002178440) /src/loki/vendor/github.com/grafana/dskit/middleware/grpc_auth.go:43 +0x5b github.com/grafana/loki/v3/pkg/util/fakeauth.SetupAuthMiddleware.func1({0x34f5570, 0xc001f7c060}, {0x2a8eb20, 0xc00217a4b0}, 0xc002905da0, 0xc002178440) /src/loki/pkg/util/fakeauth/fake_auth.go:27 +0x9d google.golang.org/grpc.getChainUnaryHandler.func1({0x34f5570, 0xc001f7c060}, {0x2a8eb20, 0xc00217a4b0}) /src/loki/vendor/google.golang.org/grpc/server.go:1203 +0xb2 github.com/grafana/dskit/middleware.UnaryServerInstrumentInterceptor.func1({0x34f5570, 0xc001f7c060}, {0x2a8eb20, 0xc00217a4b0}, 0xc002905da0, 0xc002178400) /src/loki/vendor/github.com/grafana/dskit/middleware/grpc_instrumentation.go:46 +0xbd google.golang.org/grpc.getChainUnaryHandler.func1({0x34f5570, 0xc001f7c060}, {0x2a8eb20, 0xc00217a4b0}) /src/loki/vendor/google.golang.org/grpc/server.go:1203 +0xb2 github.com/grafana/dskit/server.newServer.HTTPGRPCTracingInterceptor.func3({0x34f5570?, 0xc001f7c060?}, {0x2a8eb20?, 0xc00217a4b0?}, 0xc002905da0?, 0xc002027d70?) /src/loki/vendor/github.com/grafana/dskit/middleware/http_tracing.go:74 +0x97c google.golang.org/grpc.getChainUnaryHandler.func1({0x34f5570, 0xc001f7c060}, {0x2a8eb20, 0xc00217a4b0}) /src/loki/vendor/google.golang.org/grpc/server.go:1203 +0xb2 github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1({0x34f5570, 0xc001ffdb90}, {0x2a8eb20, 0xc00217a4b0}, 0xc002905da0, 0xc002178240) /src/loki/vendor/github.com/opentracing-contrib/go-grpc/server.go:57 +0x3e7 google.golang.org/grpc.getChainUnaryHandler.func1({0x34f5570, 0xc001ffdb90}, {0x2a8eb20, 0xc00217a4b0}) /src/loki/vendor/google.golang.org/grpc/server.go:1203 +0xb2 github.com/grafana/dskit/middleware.GRPCServerLog.UnaryServerInterceptor({{0x34d1ae0?, 0xc000b7b7c0?}, 0xc0?, 0x52?}, {0x34f5570, 0xc001ffdb90}, {0x2a8eb20, 0xc00217a4b0}, 0xc002905da0, 0xc002178200) /src/loki/vendor/github.com/grafana/dskit/middleware/grpc_logging.go:54 +0xaf google.golang.org/grpc.NewServer.chainUnaryServerInterceptors.chainUnaryInterceptors.func1({0x34f5570, 0xc001ffdb90}, {0x2a8eb20, 0xc00217a4b0}, 0xc002905da0, 0x80?) /src/loki/vendor/google.golang.org/grpc/server.go:1194 +0x85 github.com/grafana/loki/v3/pkg/logproto._BloomGateway_FilterChunkRefs_Handler({0x29cec40, 0xc0005f0408}, {0x34f5570, 0xc001ffdb90}, 0xc002938f00, 0xc0008e8c60) /src/loki/pkg/logproto/bloomgateway.pb.go:548 +0x143 google.golang.org/grpc.(*Server).processUnaryRPC(0xc000a79000, {0x34f5570, 0xc001ffdaa0}, {0x3510ee0, 0xc0006ec680}, 0xc001a94b40, 0xc000157590, 0x4c04d20, 0x0) /src/loki/vendor/google.golang.org/grpc/server.go:1386 +0xdf8 google.golang.org/grpc.(*Server).handleStream(0xc000a79000, {0x3510ee0, 0xc0006ec680}, 0xc001a94b40) /src/loki/vendor/google.golang.org/grpc/server.go:1797 +0xe87 google.golang.org/grpc.(*Server).serveStreams.func2.1() /src/loki/vendor/google.golang.org/grpc/server.go:1027 +0x8b created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 2849 /src/loki/vendor/google.golang.org/grpc/server.go:1038 +0x125 goroutine 1 [select, 19 minutes]: github.com/grafana/dskit/services.(*Manager).AwaitStopped(0xc000aab1a0, {0x34f53b0, 0x4cb79a0}) /src/loki/vendor/github.com/grafana/dskit/services/manager.go:145 +0x67 github.com/grafana/loki/v3/pkg/loki.(*Loki).Run(0xc000ac5008, {0x0?, {0x4?, 0x2?, 0x4c53b60?}}) /src/loki/pkg/loki/loki.go:556 +0xdf0 main.main() /src/loki/cmd/loki/main.go:129 +0x1333 goroutine 6 [select, 19 minutes]: github.com/baidubce/bce-sdk-go/util/log.NewLogger.func1() /src/loki/vendor/github.com/baidubce/bce-sdk-go/util/log/logger.go:375 +0xa5 created by github.com/baidubce/bce-sdk-go/util/log.NewLogger in goroutine 1 /src/loki/vendor/github.com/baidubce/bce-sdk-go/util/log/logger.go:368 +0x116 goroutine 9 [select]: go.opencensus.io/stats/view.(*worker).start(0xc0001c9700) /src/loki/vendor/go.opencensus.io/stats/view/worker.go:292 +0x9f created by go.opencensus.io/stats/view.init.0 in goroutine 1 /src/loki/vendor/go.opencensus.io/stats/view/worker.go:34 +0x8d goroutine 79 [chan receive]: github.com/grafana/loki/v3/pkg/util/log.newPrometheusLogger.WithFlushPeriod.func2.1() /src/loki/vendor/github.com/grafana/dskit/log/buffered.go:76 +0x97 created by github.com/grafana/loki/v3/pkg/util/log.newPrometheusLogger.WithFlushPeriod.func2 in goroutine 1 /src/loki/vendor/github.com/grafana/dskit/log/buffered.go:72 +0x65 goroutine 125 [select, 19 minutes]: github.com/grafana/loki/v3/pkg/loki.(*Loki).initServer.NewServerService.func4({0x34f55a8, 0xc000b7a190}) /src/loki/pkg/loki/modules.go:1810 +0xe6 github.com/grafana/dskit/services.(*BasicService).main(0xc0007199a0) /src/loki/vendor/github.com/grafana/dskit/services/basic_service.go:190 +0x1cf created by github.com/grafana/dskit/services.(*BasicService).StartAsync.func1 in goroutine 265 /src/loki/vendor/github.com/grafana/dskit/services/basic_service.go:119 +0x105 goroutine 129 [select]: github.com/uber/jaeger-client-go.(*RemotelyControlledSampler).pollControllerWithTicker(0xc000a1e000, 0xc000b7a000) /src/loki/vendor/github.com/uber/jaeger-client-go/sampler_remote.go:153 +0x85 github.com/uber/jaeger-client-go.(*RemotelyControlledSampler).pollController(0xc000a1e000) /src/loki/vendor/github.com/uber/jaeger-client-go/sampler_remote.go:148 +0x5e created by github.com/uber/jaeger-client-go.NewRemotelyControlledSampler in goroutine 1 /src/loki/vendor/github.com/uber/jaeger-client-go/sampler_remote.go:87 +0x156 goroutine 131 [select]: github.com/uber/jaeger-client-go/utils.(*reconnectingUDPConn).reconnectLoop(0xc0005f6460, 0x0?) /src/loki/vendor/github.com/uber/jaeger-client-go/utils/reconnecting_udp_conn.go:70 +0xaa created by github.com/uber/jaeger-client-go/utils.newReconnectingUDPConn in goroutine 1 /src/loki/vendor/github.com/uber/jaeger-client-go/utils/reconnecting_udp_conn.go:60 +0x1df goroutine 132 [select]: github.com/uber/jaeger-client-go.(*remoteReporter).processQueue(0xc000862600) /src/loki/vendor/github.com/uber/jaeger-client-go/reporter.go:296 +0xd1 created by github.com/uber/jaeger-client-go.NewRemoteReporter in goroutine 1 /src/loki/vendor/github.com/uber/jaeger-client-go/reporter.go:237 +0x23f goroutine 152 [select]: github.com/grafana/loki/v3/pkg/storage/stores/shipper/bloomshipper.(*BlocksCache).runTTLEvictJob(0xc000974e80, 0xdf8475800, 0x9d29229e0000) /src/loki/pkg/storage/stores/shipper/bloomshpanic: runtime error: index out of range [0] with length 0 goroutine 5443 [running]: github.com/grafana/loki/v3/pkg/util/server.onPanic({0x2957a60, 0xc0028dbd28}) /src/loki/pkg/util/server/recovery.go:57 +0x4d github.com/grafana/loki/v3/pkg/util/server.init.WithRecoveryHandler.func4.1({0x0?, 0x0?}, {0x2957a60?, 0xc0028dbd28?}) /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/options.go:33 +0x27 github.com/grpc-ecosystem/go-grpc-middleware/recovery.recoverFrom({0x34f5570?, 0xc001f7c6c0?}, {0x2957a60?, 0xc0028dbd28?}, 0xc003c0c600?) /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:61 +0x30 github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1.1() /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:29 +0x75 panic({0x2957a60?, 0xc0028dbd28?}) /usr/local/go/src/runtime/panic.go:770 +0x132 github.com/grafana/loki/v3/pkg/bloomgateway.(*Gateway).FilterChunkRefs(0xc0005f0408, {0x34f5570, 0xc001f7c6c0}, 0xc00217b310) /src/loki/pkg/bloomgateway/bloomgateway.go:240 +0x16e7 github.com/grafana/loki/v3/pkg/logproto._BloomGateway_FilterChunkRefs_Handler.func1({0x34f5570?, 0xc001f7c6c0?}, {0x2a8eb20?, 0xc00217b310?}) /src/loki/pkg/logproto/bloomgateway.pb.go:546 +0xcb github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1({0x34f5570?, 0xc001f7c6c0?}, {0x2a8eb20?, 0xc00217b310?}, 0x34f5570?, 0xc001f7c690?) /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:33 +0xb0 google.golang.org/grpc.getChainUnaryHandler.func1({0x34f5570, 0xc001f7c6c0}, {0x2a8eb20, 0xc00217b310}) /src/loki/vendor/google.golang.org/grpc/server.go:1203 +0xb2 github.com/grafana/dskit/middleware.ServerUserHeaderInterceptor({0x34f5570?, 0xc001f7c690?}, {0x2a8eb20, 0xc00217b310}, 0x40?, 0xc002178900) /src/loki/vendor/github.com/grafana/dskit/middleware/grpc_auth.go:43 +0x5b github.com/grafana/loki/v3/pkg/util/fakeauth.SetupAuthMiddleware.func1({0x34f5570, 0xc001f7c690}, {0x2a8eb20, 0xc00217b310}, 0xc001cca000, 0xc002178900) /src/loki/pkg/util/fakeauth/fake_auth.go:27 +0x9d google.golang.org/grpc.getChainUnaryHandler.func1({0x34f5570, 0xc001f7c690}, {0x2a8eb20, 0xc00217b310}) /src/loki/vendor/google.golang.org/grpc/server.go:1203 +0xb2 github.com/grafana/dskit/middleware.UnaryServerInstrumentInterceptor.func1({0x34f5570, 0xc001f7c690}, {0x2a8eb20, 0xc00217b310}, 0xc001cca000, 0xc0021788c0) /src/loki/vendor/github.com/grafana/dskit/middleware/grpc_instrumentation.go:46 +0xbd google.golang.org/grpc.getChainUnaryHandler.func1({0x34f5570, 0xc001f7c690}, {0x2a8eb20, 0xc00217b310}) /src/loki/vendor/google.golang.org/grpc/server.go:1203 +0xb2 github.com/grafana/dskit/server.newServer.HTTPGRPCTracingInterceptor.func3({0x34f5570?, 0xc001f7c690?}, {0x2a8eb20?, 0xc00217b310?}, 0xc001cca000?, 0xc002027f68?) /src/loki/vendor/github.com/grafana/dskit/middleware/http_tracing.go:74 +0x97c google.golang.org/grpc.getChainUnaryHandler.func1({0x34f5570, 0xc001f7c690}, {0x2a8eb20, 0xc00217b310}) /src/loki/vendor/google.golang.org/grpc/server.go:1203 +0xb2 github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1({0x34f5570, 0xc001f7c4e0}, {0x2a8eb20, 0xc00217b310}, 0xc001cca000, 0xc002178700) /src/loki/vendor/github.com/opentracing-contrib/go-grpc/server.go:57 +0x3e7 google.golang.org/grpc.getChainUnaryHandler.func1({0x34f5570, 0xc001f7c4e0}, {0x2a8eb20, 0xc00217b310}) /src/loki/vendor/google.golang.org/grpc/server.go:1203 +0xb2 github.com/grafana/dskit/middleware.GRPCServerLog.UnaryServerInterceptor({{0x34d1ae0?, 0xc000b7b7c0?}, 0xc0?, 0x52?}, {0x34f5570, 0xc001f7c4e0}, {0x2a8eb20, 0xc00217b310}, 0xc001cca000, 0xc0021786c0) /src/loki/vendor/github.com/grafana/dskit/middleware/grpc_logging.go:54 +0xaf google.golang.org/grpc.NewServer.chainUnaryServerInterceptors.chainUnaryInterceptors.func1({0x34f5570, 0xc001f7c4e0}, {0x2a8eb20, 0xc00217b310}, 0xc001cca000, 0x80?) /src/loki/vendor/google.golang.org/grpc/server.go:1194 +0x85 github.com/grafana/loki/v3/pkg/logproto._BloomGateway_FilterChunkRefs_Handler({0x29cec40, 0xc0005f0408}, {0x34f5570, 0xc001f7c4e0}, 0xc002939180, 0xc0008e8c60) /src/loki/pkg/logproto/bloomgateway.pb.go:548 +0x143 google.golang.org/grpc.(*Server).processUnaryRPC(0xc000a79000, {0x34f5570, 0xc001f7c3f0}, {0x3510ee0, 0xc0006ec680}, 0xc001a95200, 0xc000157590, 0x4c04d20, 0x0) /src/loki/vendor/google.golang.org/grpc/server.go:1386 +0xdf8 google.golang.org/grpc.(*Server).handleStream(0xc000a79000, {0x3510ee0, 0xc0006ec680}, 0xc001a95200) /src/loki/vendor/google.golang.org/grpc/server.go:1797 +0xe87 google.golang.org/grpc.(*Server).serveStreams.func2.1() /src/loki/vendor/google.golang.org/grpc/server.go:1027 +0x8b created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 2849 /src/loki/vendor/google.golang.org/grpc/server.go:1038 +0x125 goroutine 1 [select, 19 minutes]: github.com/grafana/dskit/services.(*Manager).AwaitStopped(0xc000aab1a0, {0x34f53b0, 0x4cb79a0}) /src/loki/vendor/github.com/grafana/dskit/services/manager.go:145 +0x67 github.com/grafana/loki/v3/pkg/loki.(*Loki).Run(0xc000ac5008, {0x0?, {0x4?, 0x2?, 0x4c53b60?}}) /src/loki/pkg/loki/loki.go:556 +0xdf0 main.main() /src/loki/cmd/loki/main.go:129 +0x1333 goroutine 6 [select, 19 minutes]: github.com/baidubce/bce-sdk-go/util/log.NewLogger.func1() /src/loki/vendor/github.com/baidubce/bce-sdk-go/util/log/logger.go:375 +0xa5 created by github.com/baidubce/bce-sdk-go/util/log.NewLogger in goroutine 1 /src/loki/vendor/github.com/baidubce/bce-sdk-go/util/log/logger.go:368 +0x116 goroutine 9 [select]: go.opencensus.io/stats/view.(*worker).start(0xc0001c9700) /src/loki/vendor/go.opencensus.io/stats/view/worker.go:292 +0x9f created by go.opencensus.io/stats/view.init.0 in goroutine 1 /src/loki/vendor/go.opencensus.io/stats/view/worker.go:34 +0x8d goroutine 79 [chan receive]: github.com/grafana/loki/v3/pkg/util/log.newPrometheusLogger.WithFlushPeriod.func2.1() /src/loki/vendor/github.com/grafana/dskit/log/buffered.go:76 +0x97 created by github.com/grafana/loki/v3/pkg/util/log.newPrometheusLogger.WithFlushPeriod.func2 in goroutine 1 /src/loki/vendor/github.com/grafana/dskit/log/buffered.go:72 +0x65 goroutine 125 [select, 19 minutes]: github.com/grafana/loki/v3/pkg/loki.(*Loki).initServer.NewServerService.func4({0x34f55a8, 0xc000b7a190}) /src/loki/pkg/loki/modules.go:1810 +0xe6 github.com/grafana/dskit/services.(*BasicService).main(0xc0007199a0) /src/loki/vendor/github.com/grafana/dskit/services/basic_service.go:190 +0x1cf created by github.com/grafana/dskit/services.(*BasicService).StartAsync.func1 in goroutine 265 /src/loki/vendor/github.com/grafana/dskit/services/basic_service.go:119 +0x105 goroutine 129 [select]: github.com/uber/jaeger-client-go.(*RemotelyControlledSampler).pollControllerWithTicker(0xc000a1e000, 0xc000b7a000) /src/loki/vendor/github.com/uber/jaeger-client-go/sampler_remote.go:153 +0x85 github.com/uber/jaeger-client-go.(*RemotelyControlledSampler).pollController(0xc000a1e000) /src/loki/vendor/github.com/uber/jaeger-client-go/sampler_remote.go:148 +0x5e created by github.com/uber/jaeger-client-go.NewRemotelyControlledSampler in goroutine 1 /src/loki/vendor/github.com/uber/jaeger-client-go/sampler_remote.go:87 +0x156 goroutine 131 [select]: github.com/uber/jaeger-client-go/utils.(*reconnectingUDPConn).reconnectLoop(0xc0005f6460, 0x0?) /src/loki/vendor/github.com/uber/jaeger-client-go/utils/reconnecting_udp_conn.go:70 +0xaa created by github.com/uber/jaeger-client-go/utils.newReconnectingUDPConn in goroutine 1 /src/loki/vendor/github.com/uber/jaeger-client-go/utils/reconnecting_udp_conn.go:60 +0x1df goroutine 132 [select]: github.com/uber/jaeger-client-go.(*remoteReporter).processQueue(0xc000862600) /src/loki/vendor/github.com/uber/jaeger-client-go/reporter.go:296 +0xd1 created by github.com/uber/jaeger-client-go.NewRemoteReporter in goroutine 1 /src/loki/vendor/github.com/uber/jaeger-client-go/reporter.go:237 +0x23f goroutine 152 [select]: github.com/grafana/loki/v3/pkg/storage/stores/shipper/bloomshipper.(*BlocksCache).runTTLEvictJob(0xc000974e80, 0xdf8475800, 0x9d29229e0000) /src/loki/pkg/storage/stores/shipper/bloomshlevel=info ts=2024-07-08T02:40:26.847681553Z caller=processor.go:43 component=bloom-gateway worker=bloom-query-worker-1 msg="process tasks with bounds" tenant=core tasks=1 bounds=1 ```

Disabling results_cache also fixes the issue for me.