k8up-io / k8up

Kubernetes and OpenShift Backup Operator
https://k8up.io/
Apache License 2.0
636 stars 63 forks source link

Reconciler errors for all backups #859

Closed SchoolGuy closed 1 year ago

SchoolGuy commented 1 year ago

Description

Either I am unlucky or I did the upgrade to 4.2.2 wrong but in either case, sadly my k8up is not working as expected atm.

Additional Context

So the update verb is present and as such I believe the upgrade to 2.7.1/4.2.2 should have worked as expected.

grafik

Logs

That line is present for all Backups that are being created by my various schedules

2023-05-23T20:21:51Z ERROR k8up.operator Reconciler error {"controller": "backup.k8up.io", "controllerGroup": "k8up.io", "controllerKind": "Backup", "Backup": {"name":"backup-schedule-schoolguy-jellyfin-backup-mf24w","namespace":"jellyfin"}, "namespace": "jellyfin", "name": "backup-schedule-schoolguy-jellyfin-backup-mf24w", "reconcileID": "3430573b-bc7d-45a8-bfe0-bd250419be2e", "error": "RoleBinding.rbac.authorization.k8s.io \"pod-executor-namespaced\" is invalid: roleRef: Invalid value: rbac.RoleRef{APIGroup:\"rbac.authorization.k8s.io\", Kind:\"ClusterRole\", Name:\"k8up-executor\"}: cannot change roleRef"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
 /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:326
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
 /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
 /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:234

Expected Behavior

k8up is not throwing errors.

Steps To Reproduce

  1. Upgrade from an older version of k8up to 2.7.1/4.2.2
  2. Have any Schedule present
  3. See schedules fail

Version of K8up

2.7.1 (app) / 4.2.2 (helmchart)

Version of Kubernetes

v1.26.4+k3s1

Distribution of Kubernetes

k3s

Kidswiss commented 1 year ago

Hi @SchoolGuy

Can you try to delete the RoleBinding pod-executor-namespaced in your namespaces?

SchoolGuy commented 1 year ago

@Kidswiss Will do once I am home again from work.

Kidswiss commented 1 year ago

It looks like the roleRef is immutable once it's set 🙈

I'm adding a point to the release notes once you can confirm the issue, so that people are aware that this might happen. Thanks for reporting.

SchoolGuy commented 1 year ago

I deleted the RoleBinding and now the Jobs are starting again. As such the solution during an upgrade would be indeed to delete all RoleBindings and let them be recreated.

Kidswiss commented 1 year ago

Thanks for confirming this!

I'm adding a note to the release and think of an automated solution for this problem.

SchoolGuy commented 1 year ago

Appendix: I double-checked and you need to either delete the k8up pod or restart it. The changes are not picked up dynamically. Sorry for not mentioning this earlier.

Since all Jobs get started at the same time again, this can potentially be a high-load scenario again.

Kidswiss commented 1 year ago

I've opened a PR for helm hooks to cleanup stray resources: https://github.com/k8up-io/k8up/pull/863