percona / percona-xtradb-cluster-operator

Percona Operator for MySQL based on Percona XtraDB Cluster
https://www.percona.com/doc/kubernetes-operator-for-pxc/index.html
Apache License 2.0
512 stars 184 forks source link

Operator crashes when restoring cluster #1749

Open sagargulabani opened 3 weeks ago

sagargulabani commented 3 weeks ago

Report

I am trying to restore a backup to a new percona cluster without specifying the backupName. Since this is a new kubernetes cluster, I don't have the backup name with me.

More about the problem

2024-06-29T16:24:18.336Z        INFO    backup restore request  {"controller": "pxcrestore-controller", "namespace": "dev", "name": "restore1", "reconcileID": "63b467f2-c684-4227-ae52-8c93d4a005f1"}
2024-06-29T16:24:18.351Z        INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference        {"controller": "pxcrestore-controller", "namespace": "dev", "name": "restore1", "reconcileID": "63b467f2-c684-4227-ae52-8c93d4a005f1"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x112d1b0]

goroutine 135 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:116 +0x1a4
panic({0x154a780?, 0x28ca670?})
        /usr/local/go/src/runtime/panic.go:914 +0x218
github.com/percona/percona-xtradb-cluster-operator/pkg/apis/pxc/v1.(*PXCBackupStatus).GetStorageType(...)
        /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/apis/pxc/v1/pxc_backup_types.go:140
github.com/percona/percona-xtradb-cluster-operator/pkg/pxc/backup.RestoreJob(0x40017511e0, 0x40017ffd40, 0x4001ff9900, {0x40020c0000, 0x5a}, 0x0)
        /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/pxc/backup/restore.go:140 +0x50
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore.(*s3).Job(0x400039e0c0?)
        /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore/restorer.go:38 +0x38
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore.(*ReconcilePerconaXtraDBClusterRestore).validate(0x152f160?, {0x1af1e68, 0x4001df3aa0}, 0x40017511e0, 0x3?, 0x4001ff9900?)
        /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore/restore.go:80 +0x4c
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore.(*ReconcilePerconaXtraDBClusterRestore).Reconcile(0x400039e0c0, {0x1af1e68, 0x4001df3aa0}, {{{0x4001da56a0?, 0x5?}, {0x4001da5698?, 0x4002dd9cf8?}}})
        /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore/controller.go:190 +0xa90
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1af58a8?, {0x1af1e68?, 0x4001df3aa0?}, {{{0x4001da56a0?, 0xb?}, {0x4001da5698?, 0x0?}}})
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:119 +0x8c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0x40004377c0, {0x1af1ea0, 0x4000408eb0}, {0x15f31c0?, 0x40021829c0?})
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:316 +0x2e8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0x40004377c0, {0x1af1ea0, 0x4000408eb0})
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:266 +0x16c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:227 +0x74
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 40
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:223 +0x43c

My configuration

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: restore1
  namespace: dev
spec:
  pxcCluster: pxc-db-two
  # backupName: backup1
  resources:
    requests:
      memory: "1Gi"
      cpu: "1"
    limits:
      memory: "1Gi"
      cpu: "1.5"
  backupSource:
    destination: s3://test-backup-bucket/percona-dev-backup/pxc-db-2024-06-29-15:45:32-full/
    s3:
      bucket: s3://test-backup-bucket/percona-dev-backup/
      credentialsSecret: aws-secret
      region: eu-west-1

Steps to reproduce

1.Create a pxc cluster 2.Try to restore the cluster from s3 using the path, not the backup name.

  1. watch it crash

Versions

Kubernetes - 1.30 Operator - 1.14 2024-06-29T16:21:47.452Z INFO setup Runs on {"platform": "kubernetes", "version": "v1.30.0-eks-036c24b"} 2024-06-29T16:21:47.452Z INFO setup Manager starting up {"gitCommit": "c85a021f2a21441500b02a2c0b3d17e8a8b25996", "gitBranch": "release-1-14-0", "buildTime": "2024-03-01T09:01:29Z", "goVersion": "go1.21.7", "os": "linux", "arch": "arm64"}

Anything else?

No response

sagargulabani commented 3 weeks ago

@hors @cap1984 @tplavcic @nonemax Please can you please check on this one, Thanks. This is a hard blocker for us.

inelpandzic commented 2 weeks ago

Hey @sagargulabani , thanks for reporting, we'll check it.

sagargulabani commented 2 weeks ago

hi @inelpandzic , any update ?

ydixken commented 2 weeks ago

Hi @inelpandzic we can also confirm this bug. This is the resource that was used, please note that it works on some clusters, but not all the time.

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: bootstrap
spec:
  pxcCluster: percona-cluster
  backupSource:
    destination: s3://percona-xtrabackup-bootstrap/common/bootstrap
    s3:
      credentialsSecret: percona
      region: ""
      endpointUrl: https://minio.redacted.tld/

Logs:


-4aeb-a212-fb33ccf5e9c7"}
2024-07-08T12:11:21.373Z    INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference    {"controller": "pxcrestore-controller",
"namespace": "default", "name": "bootstrap", "reconcileID": "4ff5700d-ec23-4aeb-a212-fb33ccf5e9c7"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x1634635]

goroutine 77 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x1aa5fe0?, 0x2e707b0?})
    /usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/percona/percona-xtradb-cluster-operator/pkg/apis/pxc/v1.(*PXCBackupStatus).GetStorageType(...)
    /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/apis/pxc/v1/pxc_backup_types.go:140
github.com/percona/percona-xtradb-cluster-operator/pkg/pxc/backup.RestoreJob(0xc000ee89c0, 0xc000e73b00, 0xc000bcd400, {0xc001441b00, 0x32}, 0x0)
    /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/pxc/backup/restore.go:140 +0x75
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore.(*s3).Job(0xc0006ffda0?)
    /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore/restorer.go:38 +0x32
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore.(*ReconcilePerconaXtraDBClusterRestore).validate(0x1a8a960?, {0x204fa08, 0xc000d22750}, 0xc000ee89c
0, 0x204f998?, 0xc000bcd400?)
    /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore/restore.go:80 +0x4b
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore.(*ReconcilePerconaXtraDBClusterRestore).Reconcile(0xc0006ffda0, {0x204fa08, 0xc000d22750}, {{{0xc00
119fde0?, 0x5?}, {0xc00119fdd6?, 0xc000923d08?}}})
    /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore/controller.go:190 +0xf14
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x2053448?, {0x204fa08?, 0xc000d22750?}, {{{0xc00119fde0?, 0xb?}, {0xc00119fdd6?, 0x0?}}})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0002890e0, {0x204fa40, 0xc00030cc80}, {0x1b4eb40?, 0xc000491f80?})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:316 +0x3cc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0002890e0, {0x204fa40, 0xc00030cc80})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:266 +0x1af
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()```
ydixken commented 2 weeks ago

FYI As we're under a Percona support contract we've also raised this issue with the Percona support team. Ticket ID: CS0048052

ydixken commented 2 weeks ago

I've found the issue - you need to have:

    xtradb:
      backup:
        enabled: true
        storages:
          minio:
            type: $your_storage

The important part is that the backup.storages are set. Anyway this should not segfault but generate a log message.

Edit: Updated resolution advice with the right key.

cc @inelpandzic @sagargulabani

sagargulabani commented 2 weeks ago

@ydixken Thank you for the update. Just to clarify the above config belongs to the actual pxc database resource.

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBCluster

which looks like

...
spec:
  backup:
    image: percona/percona-xtradb-cluster-operator:1.14.0-pxc8.0-backup-pxb8.0.35
    pitr:
      enabled: false
    schedule:
    - keep: 5
      name: hourly-backup
      schedule: 45 * * * *
      storageName: s3-subdir-eu-west-2
    storages:
      s3-subdir-eu-west-2:
        s3:
          bucket: test-bucket/test
          credentialsSecret: s3-backup-aws-creds
          region: eu-west-2
        schedulerName: default-scheduler
        type: s3
ydixken commented 1 week ago

Thanks for the heads-up!

Just to clarify, we've got following configured, before the storage was missing - and I've encountered the behavior you've described:

      backup:
        storages:
          minio-bootstrap:
            type: s3
            s3:
              bucket: "percona-xtrabackup-bootstrap"
              endpointUrl: "https://minio.redacted.tld"
              credentialsSecret: percona

To trigger a restore, I'm using:

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: bootstrap
spec:
  pxcCluster: percona-cluster
  backupSource:
    destination: s3://percona-xtrabackup-bootstrap/prod
    s3:
      credentialsSecret: percona
      region: ""
      endpointUrl: https://minio.redacted.tld/

Maybe this helps out?

sagargulabani commented 1 week ago

yes after I added the storages section, it did work for me.

ydixken commented 1 week ago

glad to hear :-)