operator-framework / operator-sdk

SDK for building Kubernetes applications. Provides high level APIs, useful abstractions, and project scaffolding.
https://sdk.operatorframework.io
Apache License 2.0
7.24k stars 1.74k forks source link

helm-operator fails to annotate some resources #6472

Closed lufinima closed 4 months ago

lufinima commented 1 year ago

Bug Report

Helm-operator fails to annotate some resources meaning that chart updates will fail.

Description

I've created an helm-operator to deploy nginx ingress controller. The first version of this operator was created using operator-sdk version 1.24. The command to create the operator was as follows:

operator-sdk init \
  --plugins helm \
  --helm-chart ingress-nginx \
  --helm-chart-repo https://kubernetes.github.io/ingress-nginx \
  --helm-chart-version 4.0.3 \
  --domain helm.k8s.io \
  --group charts \
  --version v1 \
  --kind NginxIngressController

Now I updated operator-sdk to version 1.29 and updated the ingress-nginx to version 4.6.1

operator-sdk init \
  --plugins helm \
  --helm-chart ingress-nginx \
  --helm-chart-repo https://kubernetes.github.io/ingress-nginx \
  --helm-chart-version 4.6.1 \
  --domain helm.k8s.io \
  --group charts \
  --version v1 \
  --kind NginxIngressController

When I try to upgrade the first version of the operator to the second one everything seems to work except that the ingress controller never gets updated, giving the following error while the operator tried to reconcile:

failed to get candidate release: rendered manifests contain a resource that already exists. Unable to continue with update: HorizontalPodAutoscaler "nina-annotation-controller" in namespace "ingress-controller-operator" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "nina-annotation"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "ingress-controller-operator"

So after investigating the issue it seems that the operator doesn't annotate the HorizontalPodAutoscaler resource with

metadata:
  annotations:
    meta.helm.sh/release-name: nina-annotation
    meta.helm.sh/release-namespace: ingress-controller-operator

while for example the Deployment resource gets annotated.

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: nina-annotation
    meta.helm.sh/release-namespace: ingress-controller-operator
  creationTimestamp: "2023-06-20T09:33:30Z"

I've discovered this issue of missing annotations in HorizontalPodAutoscaler but might be happening with other resources.

Workaround to minimize bug impact

So what was happening invalidated upgrade of the operator, the only way to bypass the issue and be able to upgrade operator correctly and the ingress controllers as well was to disable autoscaling in my Custom Resource before updating the controller and only after everything getting updated as it should I enabled autoscaling again.

Environment

minikube

minikube version: v1.30.1
commit: 08896fd1dc362c097c925146c4a0d0dac715ace0

minikube setup with

minikube start --cpus 4 --driver=docker --addons ingress --addons ingress-dns --addons metrics-server --kubernetes-version=1.24.8

operator-sdk

operator-sdk version: "v1.29.0", commit: "78c564319585c0c348d1d7d9bbfeed1098fab006", kubernetes version: "1.26.0", go version: "go1.19.9", GOOS: "darwin", GOARCH: "arm64"
horis233 commented 1 year ago

We also observe the same problem when we upgrade operator-sdk from version v1.22 to v1.28. We find this issue happens sometimes. We believe it could be an issue involved in the newer version of operator-sdk.

I am considering if we can revert the operator-sdk version could be a fix of this problem.

openshift-bot commented 1 year ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

lufinima commented 1 year ago

/remove-lifecycle stale

openshift-bot commented 10 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

lufinima commented 9 months ago

/remove-lifecycle stale

openshift-bot commented 6 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 5 months ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 4 months ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci[bot] commented 4 months ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/operator-framework/operator-sdk/issues/6472#issuecomment-2177273055): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.