kanisterio / kanister

An extensible framework for application-level data management on Kubernetes
https://kanister.io
Apache License 2.0
764 stars 155 forks source link

[BUG] pods with multiple ownerReferences are never considered healthy #3208

Closed Alveel closed 2 weeks ago

Alveel commented 3 weeks ago

Describe the bug Some operators add custom ownerReferences to their objects/pods (in this case, StackGres. Kanister currently does not handle anything except a single owner reference.

To Reproduce Steps to reproduce the behavior:

  1. Create any arbitrary functional statefulset
  2. Edit the resulting pod(s) ownerReferences and add an entry
  3. Create an ActionSet that targets the statefulset
  4. Notice it never sees the statefulset as healthy

Expected behavior Kanister should look at any and all ownerReferences of a pod and find a match.

Screenshots In Kasten we get the following error:

cause: '{"cause":{"cause":{"cause":{"message":"Specified 3 replicas and only 0
  are running: could not get StatefulSet{Namespace:
  my-namespace, Name: my-statefulset}: client rate
  limiter Wait returned an error: rate: Wait(n=1) would exceed context
  deadline"},"fields":[{"name":"namespace","value":"my-namespace"},{"name":"name","value":"my-statefulset"}],"file":"kasten.io/k10/kio/exec/phases/phase/snapshot.go:426","function":"kasten.io/k10/kio/exec/phases/phase.WaitOnWorkloadReady","linenumber":426,"message":"Statefulset
  not in ready state. Retry the operation once Statefulset is
  ready"},"fields":[{"name":"workloadName","value":"my-statefulset"},{"name":"workloadNamespace","value":"my-namespace"}],"file":"kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:1128","function":"kasten.io/k10/kio/exec/phases/backup.WaitForWorkloadWithSkipWait","linenumber":1128,"message":"Error
  while waiting for workload to be ready"},"fields":[],"message":"Ignoring error
  waiting on workload to become ready"}'

Environment Kubernetes Version/Provider: OpenShift 4.14 Storage Provider: MinIO Cluster Size (#nodes): 12 Data Size: any

Additional context We are a customer of Veeam Kasten and are experiencing this issue.

Relevant code: https://github.com/kanisterio/kanister/blob/1708d6cb57d329cca1b932a5e4fa9c9afacfe1d2/pkg/kube/workload.go#L296-L302 https://github.com/kanisterio/kanister/blob/1708d6cb57d329cca1b932a5e4fa9c9afacfe1d2/pkg/kube/workload.go#L322-L329 https://github.com/kanisterio/kanister/blob/1708d6cb57d329cca1b932a5e4fa9c9afacfe1d2/pkg/kube/workload.go#L347-L350

github-actions[bot] commented 3 weeks ago

Thanks for opening this issue :+1:. The team will review it shortly.

If this is a bug report, make sure to include clear instructions how on to reproduce the problem with minimal reproducible examples, where possible. If this is a security report, please review our security policy as outlined in SECURITY.md.

If you haven't already, please take a moment to review our project's Code of Conduct document.