operator-framework / operator-controller

A new and improved management framework for extending Kubernetes with Operators
https://operator-framework.github.io/operator-controller/
Apache License 2.0
74 stars 54 forks source link

Bundle openshift-gitops-operator fails on RBAC trying to set an ownerRef on the CRD #1168

Open itroyano opened 2 months ago

itroyano commented 2 months ago

Version: OpenShift 4.16 tech preview (enabled by feature gate)

$ oc -n openshift-gitops get bundledeployments.core.rukpak.io  openshift-gitops-operator -o yaml
apiVersion: core.rukpak.io/v1alpha2
kind: BundleDeployment
metadata:
  creationTimestamp: "2024-08-26T13:03:27Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2024-08-26T13:05:55Z"
  finalizers:
  - core.rukpak.io/delete-cached-bundle
  generation: 2
  name: openshift-gitops-operator
  ownerReferences:
  - apiVersion: olm.operatorframework.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: ClusterExtension
    name: openshift-gitops-operator
    uid: 4f4b94de-3197-40cc-b564-188067006883
  resourceVersion: "57645"
  uid: 74fa0df1-1cea-4771-9769-631e174cf090
spec:
  installNamespace: openshift-gitops
  provisionerClassName: core-rukpak-io-registry
  source:
    image:
      ref: registry.redhat.io/openshift-gitops-1/gitops-operator-bundle@sha256:a782f27b301fd2c06e94125cca735590a21d87cbe18cf15f06197679462bb65d
    type: image
status:
  conditions:
  - lastTransitionTime: "2024-08-26T13:03:31Z"
    message: Successfully unpacked the image Bundle
    reason: UnpackSuccessful
    status: "True"
    type: Unpacked
  - lastTransitionTime: "2024-08-26T13:03:34Z"
    message: 'cannot patch "argocds.argoproj.io" with kind CustomResourceDefinition:
      CustomResourceDefinition.apiextensions.k8s.io "argocds.argoproj.io" is invalid:
      metadata.ownerReferences: Invalid value: []v1.OwnerReference{v1.OwnerReference{APIVersion:"core.rukpak.io/v1alpha2",
      Kind:"BundleDeployment", Name:"openshift-gitops-operator", UID:"74fa0df1-1cea-4771-9769-631e174cf090",
      Controller:(*bool)(0xc01cd9c788), BlockOwnerDeletion:(*bool)(0xc01cd9c789)},
      v1.OwnerReference{APIVersion:"core.rukpak.io/v1alpha2", Kind:"BundleDeployment",
      Name:"openshift-gitops-operator", UID:"cd53d7b4-f7d3-4cca-b78f-6b128d7b4b27",
      Controller:(*bool)(0xc01cd9c78a), BlockOwnerDeletion:(*bool)(0xc01cd9c78b)}}:
      Only one reference can have Controller set to true. Found "true" in references
      for BundleDeployment/openshift-gitops-operator and BundleDeployment/openshift-gitops-operator'
    reason: InstallFailed
    status: "False"
    type: Installed
  - lastTransitionTime: "2024-08-26T13:03:34Z"
    message: Installed condition is false
    reason: InstallationStatusFalse
    status: "False"
    type: Healthy
  contentURL: https://core.openshift-rukpak.svc/bundles/openshift-gitops-operator.tgz
  observedGeneration: 2
  resolvedSource:
    image:
      ref: registry.redhat.io/openshift-gitops-1/gitops-operator-bundle@sha256:a782f27b301fd2c06e94125cca735590a21d87cbe18cf15f06197679462bb65d
    type: image
everettraven commented 2 months ago

@itroyano based on the error message in the Installed status condition, I suspect that argocd itself has already been installed and the CRDs are also being managed by another controller.

it also looks like the version of OLM 1.0 shipped with the 4.16 TP is using RukPak and is a bit outdated compared to the current state of this project.

everettraven commented 2 months ago

Regardless though, it seems the crux of this issue is that multiple instance of extensions are attempting to create and manage the same resources which will not be supported.

itroyano commented 2 months ago

Makes sense the duplication sounds weird

Only one reference can have Controller set to true. Found "true" in references
      for BundleDeployment/openshift-gitops-operator and BundleDeployment/openshift-gitops-operator'
everettraven commented 2 months ago

Yeah, I think long-term we need to make sure that we have a more descriptive error message for this scenario

itroyano commented 2 months ago

That one seems to be thrown by RBAC not us, right? we could check for a duplication earlier as part of the multiple-instances epic https://github.com/operator-framework/operator-controller/issues/736

joelanford commented 2 months ago

@itroyano, this sounds like an from helm-operator-plugins that adds owner references to managed objects. I noticed similar duplication when I was implementing and testing the chunked release secret driver, and I think it should be fixed in both helm-operator-plugins and operator-controller main branches.

itroyano commented 2 months ago

Can we prevent the installation of an operator with v1, in case we detect v0 already has it installed?

joelanford commented 2 months ago

I think that is jumping to the conclusion that this issue is caused by duplicate installations. What I saw (and believe I fixed) was an issue where the ownerref-injecting client had a bug and would inject two different ownerrefs for the same parent object.

And that's what this looks like as well:

Found "true" in references for BundleDeployment/openshift-gitops-operator and BundleDeployment/openshift-gitops-operator'