grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.85k stars 3.44k forks source link

Loki 3.0.0 bug in bloom compactor #12540

Closed slim-bean closed 3 months ago

slim-bean commented 7 months ago

From #12506 :

          Since the upgrade everything looks good in our environments although the backend pods seem to be outputting a lot of:

level=info ts=2024-04-09T08:01:08.971329289Z caller=gateway.go:241 component=index-gateway msg="chunk filtering is not enabled" with every loki search. Wasn't happening before 3.0 from what we can tell

I suspect that's because blooms aren't enabled although when I do enable blooms we get a nil pointer:

level=info ts=2024-04-09T08:17:29.692174397Z caller=bloomcompactor.go:458 component=bloom-compactor msg=compacting org_id=plprod table=index_19820 ownership=1f6c0f8500000000-1fa8b221ffffffff
ts=2024-04-09T08:17:31.535678052Z caller=memberlist_logger.go:74 level=warn msg="Got ping for unexpected node 'loki-backend-3-2e51d875' from=10.30.80.69:7946"
level=info ts=2024-04-09T08:17:31.610784021Z caller=scheduler.go:653 msg="this scheduler is in the ReplicationSet, will now accept requests."
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1aec384]

goroutine 1430 [running]:
github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).buildGaps.OnceFunc.func4.1()
    /usr/local/go/src/sync/oncefunc.go:24 +0x7c
panic({0x2002700?, 0x42aae10?})
    /usr/local/go/src/runtime/panic.go:914 +0x218
github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).buildGaps.func2()
    /src/loki/pkg/bloomcompactor/controller.go:388 +0x24
github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).buildGaps.OnceFunc.func4()
    /usr/local/go/src/sync/oncefunc.go:27 +0x64
sync.(*Once).doSlow(0x4006e9f128?, 0x0?)
    /usr/local/go/src/sync/once.go:74 +0x100
sync.(*Once).Do(0x400004e800?, 0x21cc060?)
    /usr/local/go/src/sync/once.go:65 +0x24
github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).buildGaps.OnceFunc.func5()
    /usr/local/go/src/sync/oncefunc.go:31 +0x34
github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).buildGaps(0x4006e7e720, {0x2c70e48, 0x4006e6e7d0}, {0x4006867892, 0x6}, {{0x1f6c0f8500000000?}, {0x40005a0578?, 0x4d6c?}}, {0x4321220?, 0x0?}, ...)
    /src/loki/pkg/bloomcompactor/controller.go:396 +0x133c
github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).compactTenant(0x4006e7e720, {0x2c70e48, 0x4006e6e7d0}, {{0x2?}, {0x40005a0578?, 0x101000000226f98?}}, {0x4006867892, 0x6}, {0x2?, 0x0?}, ...)
    /src/loki/pkg/bloomcompactor/controller.go:115 +0x6a0
github.com/grafana/loki/v3/pkg/bloomcompactor.(*Compactor).compactTenantTable(0x40007eee00, {0x2c70e48, 0x4006e6e7d0}, 0x4001a7eab0, 0x0?)
    /src/loki/pkg/bloomcompactor/bloomcompactor.go:460 +0x2e8
github.com/grafana/loki/v3/pkg/bloomcompactor.(*Compactor).runWorkers.func2({0x2c70e48, 0x4006e6e7d0}, 0x0?)
    /src/loki/pkg/bloomcompactor/bloomcompactor.go:422 +0xe0
github.com/grafana/dskit/concurrency.ForEachJob.func1()
    /src/loki/vendor/github.com/grafana/dskit/concurrency/runner.go:105 +0xbc
golang.org/x/sync/errgroup.(*Group).Go.func1()
    /src/loki/vendor/golang.org/x/sync/errgroup/errgroup.go:78 +0x58
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1428
    /src/loki/vendor/golang.org/x/sync/errgroup/errgroup.go:75 +0x98

Originally posted by @rknightion in https://github.com/grafana/loki/issues/12506#issuecomment-2044387600

slim-bean commented 7 months ago

https://github.com/grafana/loki/pull/12536 is a fix for the panic, however we should also fix the info log message

slim-bean commented 7 months ago

grafana/loki:main-ec888ec includes this fix if folks don't want to wait for a 3.0.1

slim-bean commented 7 months ago

while not part of this issue, this was discovered while attempting to get rid of a noisy log line, which is now removed: https://github.com/grafana/loki/pull/12555

chaudum commented 3 months ago

Fix has already been released with 3.0.1 a and 3.1.0