k8sgateway / k8sgateway

The Cloud-Native API Gateway and AI Gateway
https://k8sgateway.io/
Apache License 2.0
4.12k stars 449 forks source link

Gloo panics and not recover from Settings changes #8627

Open TomerJLevy opened 1 year ago

TomerJLevy commented 1 year ago

Gloo Edge Product

Enterprise

Gloo Edge Version

v.1.15.0

Kubernetes Version

v.1.25.11

Describe the bug

On any changes in the Settings resource, the Gloo container panics without recovering. It printing this stack trace:

[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed: goroutine 33 [running]: runtime/debug.Stack() /usr/local/go/src/runtime/debug/stack.go:24 +0x7a sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot() /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2-0.20230808150016-0269522a418c/pkg/log/log.go:59 +0xae sigs.k8s.io/controller-runtime/pkg/log.(delegatingLogSink).WithName(0xc000426ac0, {0x6bc2cfc, 0x14}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2-0.20230808150016-0269522a418c/pkg/log/deleg.go:147 +0x4f github.com/go-logr/logr.Logger.WithName({{0x75e6798, 0xc000426ac0}, 0x0}, {0x6bc2cfc, 0x14}) /go/pkg/mod/github.com/go-logr/logr@v1.2.4/logr.go:336 +0x66 sigs.k8s.io/controller-runtime/pkg/client.newClient(0xc0030c06c0, {0x0, 0xc0000dcbd0, {0x0, 0x0}, 0x0, {0x0, 0x0}, 0x0}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2-0.20230808150016-0269522a418c/pkg/client/client.go:120 +0x14b sigs.k8s.io/controller-runtime/pkg/client.New(0xc00231b680, {0x0, 0xc0000dcbd0, {0x0, 0x0}, 0x0, {0x0, 0x0}, 0x0}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2-0.20230808150016-0269522a418c/pkg/client/client.go:101 +0xd8 github.com/solo-io/gloo/projects/gloo/pkg/api/external/solo/ratelimit.NewRateLimitClients({0x75e0010, 0xc001ce2c30}, {0x7597fe0, 0xc003bc2160}) /go/pkg/mod/github.com/solo-io/gloo@v1.16.0-beta2.0.20230818182323-52683a563b41/projects/gloo/pkg/api/external/solo/ratelimit/extensions.go:74 +0x2cb github.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.RunGlooWithExtensions({{0xc0011f9260, 0xd}, {0xc00007c02e, 0xd}, {0x0, 0x0, 0x0}, {0x7597fe0, 0xc0036d78c0}, {0x75e8c98, ...}, ...}, ...) /go/pkg/mod/github.com/solo-io/gloo@v1.16.0-beta2.0.20230818182323-52683a563b41/projects/gloo/pkg/syncer/setup/setup_syncer.go:537 +0xe65 github.com/solo-io/solo-projects/projects/gloo/pkg/setup.NewSetupFuncWithRestControlPlaneAndExtensions.func1({{0xc0011f9260, 0xd}, {0xc00007c02e, 0xd}, {0x0, 0x0, 0x0}, {0x7597fe0, 0xc0036d78c0}, {0x75e8c98, ...}, ...}) /go/src/github.com/solo-io/solo-projects/projects/gloo/pkg/setup/setup.go:68 +0x113 github.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.(setupSyncer).Setup(0xc000d96460, {0x75e0010, 0xc003266270}, {0x75e3018, 0xc000b45570}, {0x75e59d0, 0xc000d7ff80}, 0xc000c39900, {0x75b1530, 0xc0003a8e20}) /go/pkg/mod/github.com/solo-io/gloo@v1.16.0-beta2.0.20230818182323-52683a563b41/projects/gloo/pkg/syncer/setup/setup_syncer.go:424 +0x1da2 github.com/solo-io/gloo/pkg/utils/setuputils.(SetupSyncer).Sync(0xc000e783f0, {0x75e0010, 0xc003266270}, 0xc001eb2dc8) /go/pkg/mod/github.com/solo-io/gloo@v1.16.0-beta2.0.20230818182323-52683a563b41/pkg/utils/setuputils/setup_syncer.go:60 +0x4c9 github.com/solo-io/gloo/projects/gloo/pkg/api/v1.(setupEventLoop).Run.func1() /go/pkg/mod/github.com/solo-io/gloo@v1.16.0-beta2.0.20230818182323-52683a563b41/projects/gloo/pkg/api/v1/setup_event_loop.sk.go:107 +0x3c9 created by github.com/solo-io/gloo/projects/gloo/pkg/api/v1.(*setupEventLoop).Run /go/pkg/mod/github.com/solo-io/gloo@v1.16.0-beta2.0.20230818182323-52683a563b41/projects/gloo/pkg/api/v1/setup_event_loop.sk.go:88 +0x587

Expected Behavior

The Gloo container should be resilient to such event

Steps to reproduce the bug

  1. Edit the Settings resource
  2. See if the Gloo container panicked

Additional Environment Detail

No response

Additional Context

This has been fixed in OSS, but is still present in EE. To reproduce:

The solution is to call log.SetLogger(...) in every main.go file.

TomerJLevy commented 1 year ago

I couldn't reproduce it with Gloo-ee v1.14.4

sam-heilbron commented 1 year ago

Some previous github context: https://github.com/solo-io/gloo/pull/8549#pullrequestreview-1567352028 Slack context: https://solo-io-corp.slack.com/archives/C03MFATU265/p1691501216502009

nfuden commented 1 year ago

Related to https://github.com/solo-io/gloo/pull/8549 seems to be controller-runtime now requires logger to be set earlier

jenshu commented 1 year ago

A couple notes:

reproducible on k8s 1.25 and 1.27, gloo OSS and EE

jenshu commented 1 year ago

fix has been merged and will be available in GlooEE v1.15.1 once it's released

sheidkamp commented 5 months ago

Reopened because this is still an issue in EE. See "additional context".

Slack Conversation