percona / percona-postgresql-operator

Percona Operator for PostgreSQL
https://www.percona.com/doc/kubernetes-operator-for-postgresql/index.html
Apache License 2.0
254 stars 50 forks source link

operator crash loop due to nil pointer #699

Open Lobo75 opened 3 months ago

Lobo75 commented 3 months ago

Report

A user error in applying a cr.yaml that was missing the proxy section caused the stack trace seen below. It appears there is no check to see if the proxy section is nil or not.

More about the problem

2024-03-21T19:32:01.194Z INFO Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference {"controller": "perconapgcluster", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGCluster", "PerconaPGCluster": {"name":"rxtest","namespace":"postgres-operator"}, "namespace": "postgres-operator", "name": "rxtest", "reconcileID": "0ecffd68-d97a-4d13-af64-9eafd015dd10"} panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1678ace] goroutine 459 [running]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile.func1() /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:116 +0x1e5 panic({0x1a233e0?, 0x2ddbe70?}) /usr/local/go/src/runtime/panic.go:914 +0x21f github.com/percona/percona-postgresql-operator/pkg/apis/pgv2.percona.com/v2.(PerconaPGCluster).Default(0xc000cdc380) /go/src/github.com/percona/percona-postgresql-operator/pkg/apis/pgv2.percona.com/v2/perconapgcluster_types.go:179 +0x22e github.com/percona/percona-postgresql-operator/percona/controller/pgcluster.(PGClusterReconciler).Reconcile(0xc00045ef30, {0x1fcc410?, 0xc000d2b530}, {{{0xc00005ddb8?, 0x5?}, {0xc00083f6f6?, 0xc00044cd48?}}}) /go/src/github.com/percona/percona-postgresql-operator/percona/controller/pgcluster/controller.go:170 +0x1c5 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile(0x1fcf718?, {0x1fcc410?, 0xc000d2b530?}, {{{0xc00005ddb8?, 0xb?}, {0xc00083f6f6?, 0x0?}}}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119 +0xb7 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler(0xc0004e4aa0, {0x1fcc448, 0xc0003a99a0}, {0x1abf5c0?, 0xc000971140?}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316 +0x3cc sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem(0xc0004e4aa0, {0x1fcc448, 0xc0003a99a0}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266 +0x1c9 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2() /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227 +0x79 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2 in goroutine 89 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:223 +0x565

Steps to reproduce

Apply a cr.yaml missing the proxy section. Here is a simple test case to verify the problem was a incorrect yaml.

package v2_test

import ( "testing"

"github.com/stretchr/testify/assert"
"gopkg.in/yaml.v2"

v2 "github.com/percona/percona-postgresql-operator/pkg/apis/pgv2.percona.com/v2"

)

func TestPerconaPGCluster_Default(t *testing.T) { a := assert.New(t)

cluster := new(v2.PerconaPGCluster)

err := yaml.Unmarshal(postgrescluster_empty_proxy, cluster)
a.NoError(err)

cluster.Default()

}

var postgrescluster_empty_proxy []byte = []byte(` apiVersion: postgres-operator.crunchydata.com/v1beta1 kind: PostgresCluster metadata: name: hippo spec: image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-15.3-2 postgresVersion: 15 instances:

Versions

  1. Kubernetes 1.2.7
  2. Operator 2.3.1 I suspect 2.3.0 has the same issue

Anything else?

Even though this was pure user error it did cause a serious situation in that the operator went into a hard crash loop with no way I could find to break it out. The operator would not run long enough to even try to reapply the corrected yaml, a delete and restart, even an uninstall the operator (other than the crd) did not help the situation.

Thank you.

spron-in commented 2 months ago

Thanks for sharing @Lobo75 ! This is a follow up from https://forums.percona.com/t/crash-loop-in-percona-operator-observed-a-panic-in-reconciler/29289 . We will fix that in https://perconadev.atlassian.net/browse/K8SPG-543