grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
24.05k stars 3.47k forks source link

Nil Pointer IN query Frontend on upgrade to v3 #12937

Open chriskuchin opened 6 months ago

chriskuchin commented 6 months ago

Describe the bug I recently upgraded to v3 on my self hosted cluster. When I roll out the query frontend it crashes with the following error.

I haven't been able to identify what is causing the nil pointer.

    2024-05-10 11:34:06.045 /src/loki/vendor/golang.org/x/sync/errgroup/errgroup.go:75 +0x98
2024-05-10 11:34:06.045 
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 343
2024-05-10 11:34:06.045 
    /src/loki/vendor/golang.org/x/sync/errgroup/errgroup.go:78 +0x58
2024-05-10 11:34:06.045 
golang.org/x/sync/errgroup.(*Group).Go.func1()
2024-05-10 11:34:06.045 
    /src/loki/vendor/github.com/grafana/dskit/concurrency/runner.go:105 +0xbc
2024-05-10 11:34:06.045 
github.com/grafana/dskit/concurrency.ForEachJob.func1()
2024-05-10 11:34:06.045 
    /src/loki/pkg/querier/queryrange/shard_resolver.go:106 +0x1c4
2024-05-10 11:34:06.045 
github.com/grafana/loki/v3/pkg/querier/queryrange.getStatsForMatchers.func1({0x2c70e48, 0x400092b180}, 0x0)
2024-05-10 11:34:06.045 
    /src/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x44
2024-05-10 11:34:06.045 
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0x40009a2ec8?, {0x2c70e48?, 0x400092b180?}, {0x2c8c510?, 0x400036b980?})
2024-05-10 11:34:06.044 
    /src/loki/pkg/querier/queryrange/roundtrip.go:970 +0xf8
2024-05-10 11:34:06.044 
github.com/grafana/loki/v3/pkg/querier/queryrange.NewIndexStatsTripperware.statsTripperware.func4.1({0x2c70e48, 0x400092b180}, {0x2c8c510, 0x400036b980})
2024-05-10 11:34:06.044 
    /src/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x44
2024-05-10 11:34:06.044 
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0x400036b9a0?, {0x2c70e48?, 0x400092b180?}, {0x2c8c510?, 0x400036b980?})
2024-05-10 11:34:06.044 
    /src/loki/pkg/querier/queryrange/stats.go:132 +0xe8
2024-05-10 11:34:06.044 
github.com/grafana/loki/v3/pkg/querier/queryrange.StatsCollectorMiddleware.func1.1({0x2c70e48, 0x400092b180}, {0x2c8c510?, 0x400036b980?})
2024-05-10 11:34:06.044 
    /src/loki/pkg/querier/queryrange/limits.go:199 +0x83c
2024-05-10 11:34:06.044 
github.com/grafana/loki/v3/pkg/querier/queryrange.limitsMiddleware.Do({{0x2c9d5c0?, 0x40006df9b0?}, {0x2c4d5c0?, 0x4000788200?}}, {0x2c70e10?, 0x4000c586f0?}, {0x2c8c510, 0x400036b980})
2024-05-10 11:34:06.044 
    /src/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x44
2024-05-10 11:34:06.044 
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0x0?, {0x2c70e10?, 0x4000c58720?}, {0x2c8c510?, 0x400036b980?})
2024-05-10 11:34:06.044 
    /src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:26 +0x90
2024-05-10 11:34:06.044 
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1({0x2c70e10?, 0x4000c58720?}, {0x2c8c510?, 0x400036b980?})
2024-05-10 11:34:06.044 
    /src/loki/vendor/github.com/grafana/dskit/instrument/instrument.go:172 +0x244
2024-05-10 11:34:06.044 
github.com/grafana/dskit/instrument.CollectedRequest({0x2c70e10, 0x4000c58720}, {0x24a811b, 0x11}, {0x2c65ab0, 0x4000568a58}, 0x4000c58720?, 0x4000b2ce28)
2024-05-10 11:34:06.044 
    /src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:28 +0x4c
2024-05-10 11:34:06.044 
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1.1({0x2c70e10?, 0x4000c58750?})
2024-05-10 11:34:06.044 
    /src/loki/pkg/querier/queryrange/split_by_interval.go:214 +0x360
2024-05-10 11:34:06.044 
github.com/grafana/loki/v3/pkg/querier/queryrange.(*splitByInterval).Do(0x40017beb40, {0x2c70e10?, 0x4000c58750}, {0x2c8c510, 0x400036b980})
2024-05-10 11:34:06.044 
    /src/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x44
2024-05-10 11:34:06.044 
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0x40017b7680?, {0x2c70e10?, 0x4000c58750?}, {0x2c8c510?, 0x400036bb80?})
2024-05-10 11:34:06.044 
    /src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:26 +0x90
2024-05-10 11:34:06.044 
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1({0x2c70e10?, 0x4000c58750?}, {0x2c8c510?, 0x400036bb80?})
2024-05-10 11:34:06.044 
    /src/loki/vendor/github.com/grafana/dskit/instrument/instrument.go:172 +0x244
2024-05-10 11:34:06.044 
github.com/grafana/dskit/instrument.CollectedRequest({0x2c70e10, 0x4000c58750}, {0x24a812c, 0x11}, {0x2c65ab0, 0x4000568a60}, 0x1?, 0x4000b2ca08)
2024-05-10 11:34:06.044 
    /src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:28 +0x4c
2024-05-10 11:34:06.044 
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1.1({0x2c70e10?, 0x4000c587b0?})
2024-05-10 11:34:06.044 
    /src/loki/pkg/querier/queryrange/queryrangebase/results_cache.go:186 +0xa8
2024-05-10 11:34:06.044 
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.resultsCache.Do({0x4000b9cfd0, {0x2c4d8c0, 0x40004ce0f0}, {0x2c5bd00, 0x400174c1c0}, 0x4001781bc0}, {0x2c70e10, 0x4000c587b0}, {0x2c8c510?, 0x400036bb80})
2024-05-10 11:34:06.044 
    /src/loki/pkg/storage/chunk/cache/resultscache/cache.go:112 +0x86c
2024-05-10 11:34:06.044 
github.com/grafana/loki/v3/pkg/storage/chunk/cache/resultscache.ResultsCache.Do({{0x2c4d8c0, 0x40004ce0f0}, {0x2c4ee80, 0x40017c1c08}, {0x2c71200, 0x4000b98750}, {0xffff5ce19d98, 0x40006df9b0}, {0x2c4f540, 0x40006dfa10}, ...}, ...)
2024-05-10 11:34:06.043 
    /src/loki/pkg/storage/chunk/cache/resultscache/util.go:11 +0x44
2024-05-10 11:34:06.043 
github.com/grafana/loki/v3/pkg/storage/chunk/cache/resultscache.HandlerFunc.Do(0x2c70e10?, {0x2c70e10?, 0x4000c587e0?}, {0xffff5ce19dd8?, 0x400036bb80?})
2024-05-10 11:34:06.043 
    /src/loki/pkg/querier/queryrange/queryrangebase/results_cache.go:147 +0x6c
2024-05-10 11:34:06.043 
github.com/grafana/loki/v3/pkg/querier/queryrange.NewIndexStatsCacheMiddleware.NewResultsCacheMiddleware.func2.1({0x2c70e10, 0x4000c587e0}, {0xffff5ce19dd8?, 0x400036bb80})
2024-05-10 11:34:06.043 
    /src/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x44
2024-05-10 11:34:06.043 
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0x2c70e10?, {0x2c70e10?, 0x4000c587e0?}, {0x2c8c510?, 0x400036bb80?})
2024-05-10 11:34:06.043 
    /src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:26 +0x90
2024-05-10 11:34:06.043 
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1({0x2c70e10?, 0x4000c587e0?}, {0x2c8c510?, 0x400036bb80?})
2024-05-10 11:34:06.043 
    /src/loki/vendor/github.com/grafana/dskit/instrument/instrument.go:172 +0x244
2024-05-10 11:34:06.043 
github.com/grafana/dskit/instrument.CollectedRequest({0x2c70e10, 0x4000c587e0}, {0x2486af7, 0x5}, {0x2c65ab0, 0x4000568a68}, 0x4000b2c308?, 0x4000b2c2d8)
2024-05-10 11:34:06.043 
    /src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:28 +0x4c
2024-05-10 11:34:06.043 
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1.1(
[loki_config.yaml.txt](https://github.com/grafana/loki/files/15278212/loki_config.yaml.txt)
{0x2c70e10?, 0x4000c58810?})
2024-05-10 11:34:06.043 
    /src/loki/pkg/querier/queryrange/queryrangebase/retry.go:86 +0x1d4
2024-05-10 11:34:06.043 
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.retry.Do({{0x2c4d8c0, 0x40004ce0f0}, {0x2c4fb80, 0x40007ec7e0}, 0x5, 0x400064e7d0}, {0x2c70e10?, 0x4000c58810}, {0x2c8c510, 0x400036bb80})
2024-05-10 11:34:06.043 
    /src/loki/pkg/lokifrontend/frontend/downstream_roundtripper.go:37 +0x74
2024-05-10 11:34:06.043 
github.com/grafana/loki/v3/pkg/lokifrontend/frontend.downstreamRoundTripper.Do({0x400072a3f0, {0x2c4d340, 0x42cd0e0}, {0x0, 0x0}}, {0x2c70e10, 0x4000c58810}, {0x2c8c510, 0x400036bb80})
2024-05-10 11:34:06.043 
goroutine 345 [running]:
2024-05-10 11:34:06.0432024-05-10 11:34:06.043  
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x1ce24f4]
2024-05-10 11:34:06.043 
panic: runtime error: invalid memory address or nil pointer dereference

To Reproduce Steps to reproduce the behavior: I have rolled back to the previous version of my frontend but everything else is running v3 and seems to be working...

I have reviewed everything I can find and have no idea what is causing this.

Expected behavior No Nil Pointer

Environment: ECS with cloudmap

Screenshots, Promtail config, or terminal output If applicable, add any output to help explain your problem. loki_config.yaml.txt

chaudum commented 6 months ago

Hi @chriskuchin Thanks for reporting the issue. What version did you use?

Is it possible that it is the same bug as described in https://github.com/grafana/loki/issues/12842? If so, there is already a fix with https://github.com/grafana/loki/pull/12873 and will be available with the next patch release.

chriskuchin commented 6 months ago

That looks like it could be it. I wasnt able to figure out which endpoint was triggering it. I was running 3.0.0 when it was panicking. I have downgraded to 2.9.6.

Any idea when the next patch will be released?

chriskuchin commented 6 months ago

Just following up here My query frontentend are stuck on an old version and would love if loki would release a patch version to fix this so I can verify that it is the problem