grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
22.71k stars 3.3k forks source link

Loki Query Frontend fails with SIGSEGV in 3.0.0 #13307

Open chewrocca opened 6 days ago

chewrocca commented 6 days ago

Describe the bug When starting Loki 3.0.0, a runtime panic occurs due to an invalid memory address or nil pointer dereference. This issue does not occur when Loki is pinned to version 2.9.8, but other components are upgraded.

level=info ts=2024-06-24T13:38:35.20013383Z caller=loki.go:503 msg="Loki started" startup_time=50.090374ms
level=info ts=2024-06-24T13:38:35.206338161Z caller=memberlist_client.go:580 phase=startup msg="joining memberlist cluster succeeded" reached_nodes=1 elapsed_time=7.68984ms
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x22c8d5f]

goroutine 2039 [running]:
github.com/grafana/loki/v3/pkg/lokifrontend/frontend.downstreamRoundTripper.Do({0xc000a242d0, {0x32314e0, 0x48f6180}, {0x0, 0x0}}, {0x3254d88, 0xc00cb95410}, {0x3270350, 0xc00bfad960})
    /src/loki/pkg/lokifrontend/frontend/downstream_roundtripper.go:37 +0x9f
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.retry.Do({{0x3231a60, 0xc0006f2280}, {0x3233d20, 0xc0021cae40}, 0x5, 0xc002152800}, {0x3254d88?, 0xc00cb95410}, {0x3270350, 0xc00bfad960})
    /src/loki/pkg/querier/queryrange/queryrangebase/retry.go:86 +0x2c3
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1.1({0x3254d88?, 0xc00cb95410?})
    /src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:28 +0x42
github.com/grafana/dskit/instrument.CollectedRequest({0x3254d88, 0xc00cb953e0}, {0x2a6b8f9, 0x5}, {0x3249af0, 0xc003c0f018}, 0xede0b6e12?, 0xc00cbaa328)
    /src/loki/vendor/github.com/grafana/dskit/instrument/instrument.go:172 +0x262
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1({0x3254d88?, 0xc00cb953e0?}, {0x3270350?, 0xc00bfad960?})
    /src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:26 +0xa8
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0x0?, {0x3254d88?, 0xc00cb953e0?}, {0x3270350?, 0xc00bfad960?})
    /src/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x37
github.com/grafana/loki/v3/pkg/querier/queryrange.NewIndexStatsCacheMiddleware.NewResultsCacheMiddleware.func2.1({0x3254d88, 0xc00cb953e0}, {0x7f4768becaa0?, 0xc00bfad960})
    /src/loki/pkg/querier/queryrange/queryrangebase/results_cache.go:147 +0x6c
github.com/grafana/loki/v3/pkg/storage/chunk/cache/resultscache.HandlerFunc.Do(0x3254d88?, {0x3254d88?, 0xc00cb953e0?}, {0x7f4768becaa0?, 0xc00bfad960?})
    /src/loki/pkg/storage/chunk/cache/resultscache/util.go:11 +0x37
github.com/grafana/loki/v3/pkg/storage/chunk/cache/resultscache.ResultsCache.Do({{0x3231a60, 0xc0006f2280}, {0x3233020, 0xc00cbb6000}, {0x3255178, 0xc000798000}, {0x7f4768bec950, 0xc003ec2600}, {0x32336e0, 0xc003ec2630}, ...}, ...)
    /src/loki/pkg/storage/chunk/cache/resultscache/cache.go:112 +0xb45
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.resultsCache.Do({0xc00be6f130, {0x3231a60, 0xc0006f2280}, {0x323fd60, 0xc002143240}, 0xc002151c50}, {0x3254d88, 0xc00cb953b0}, {0x3270350, 0xc00bfad960})
    /src/loki/pkg/querier/queryrange/queryrangebase/results_cache.go:186 +0xf3
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1.1({0x3254d88?, 0xc00cb953b0?})
    /src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:28 +0x42
github.com/grafana/dskit/instrument.CollectedRequest({0x3254d88, 0xc00cb95350}, {0x2a8cea5, 0x11}, {0x3249af0, 0xc003c0f010}, 0x1?, 0xc00cbaaa48)
    /src/loki/vendor/github.com/grafana/dskit/instrument/instrument.go:172 +0x262
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1({0x3254d88?, 0xc00cb95350?}, {0x3270350?, 0xc00bfad960?})
    /src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:26 +0xa8
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0xc00cb9eb40?, {0x3254d88?, 0xc00cb95350?}, {0x3270350?, 0xc00bfad960?})
    /src/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x37
github.com/grafana/loki/v3/pkg/querier/queryrange.(*splitByInterval).Do(0xc00c483740, {0x3254d88?, 0xc00cb95350}, {0x3270350, 0xc00bfad8a0})
    /src/loki/pkg/querier/queryrange/split_by_interval.go:214 +0x476
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1.1({0x3254d88?, 0xc00cb95350?})
    /src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:28 +0x42
github.com/grafana/dskit/instrument.CollectedRequest({0x3254d88, 0xc00cb95320}, {0x2a8ce94, 0x11}, {0x3249af0, 0xc003c0f008}, 0x21a0055?, 0xc004104e60)
    /src/loki/vendor/github.com/grafana/dskit/instrument/instrument.go:172 +0x262
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1({0x3254d88?, 0xc00cb95320?}, {0x3270350?, 0xc00bfad8a0?})
    /src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:26 +0xa8
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0x0?, {0x3254d88?, 0xc00cb95320?}, {0x3270350?, 0xc00bfad8a0?})
    /src/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x37
github.com/grafana/loki/v3/pkg/querier/queryrange.limitsMiddleware.Do({{0x3281480?, 0xc003ec2600?}, {0x3231760?, 0xc00cbb4140?}}, {0x3254d88?, 0xc00cb952f0?}, {0x3270350, 0xc00bfad8a0})
    /src/loki/pkg/querier/queryrange/limits.go:199 +0xaf5
github.com/grafana/loki/v3/pkg/querier/queryrange.StatsCollectorMiddleware.func1.1({0x3254dc0, 0xc00cbb2cd0}, {0x3270350?, 0xc00bfad8a0?})
    /src/loki/pkg/querier/queryrange/stats.go:132 +0x122
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0xc00bfad8c0?, {0x3254dc0?, 0xc00cbb2cd0?}, {0x3270350?, 0xc00bfad8a0?})
    /src/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x37
github.com/grafana/loki/v3/pkg/querier/queryrange.NewIndexStatsTripperware.statsTripperware.func4.1({0x3254dc0, 0xc00cbb2cd0}, {0x3270350, 0xc00bfad8a0})
    /src/loki/pkg/querier/queryrange/roundtrip.go:970 +0xfd
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0xc00b2c4ee0?, {0x3254dc0?, 0xc00cbb2cd0?}, {0x3270350?, 0xc00bfad8a0?})
    /src/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x37
github.com/grafana/loki/v3/pkg/querier/queryrange.getStatsForMatchers.func1({0x3254dc0, 0xc00cbb2cd0}, 0x0)
    /src/loki/pkg/querier/queryrange/shard_resolver.go:106 +0x282
github.com/grafana/dskit/concurrency.ForEachJob.func1()
    /src/loki/vendor/github.com/grafana/dskit/concurrency/runner.go:105 +0x83
golang.org/x/sync/errgroup.(*Group).Go.func1()
    /src/loki/vendor/golang.org/x/sync/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 2037
    /src/loki/vendor/golang.org/x/sync/errgroup/errgroup.go:75 +0x96

To Reproduce Steps to reproduce the behavior:

  1. Made minor configuration changes according to upgrade notes
  2. Started Promtail and Loki 3.0.0
  3. Only the Loki Query Frontend component failed with this SIGSEGV after initially starting.

Expected behavior This does not fail in 2.9.8; if all components except for Loki Query Frontend are running 3.0.0, it does not fail.

Environment:

Screenshots, Promtail config, or terminal output If applicable, add any output to help explain your problem.

chewrocca commented 6 days ago

https://github.com/grafana/loki/issues/13208 This seems related. However, we're using a query scheduler.

chewrocca commented 4 days ago

It seems like this is working in "main."