kubevirt / hyperconverged-cluster-operator

Operator pattern for managing multi-operator products
Apache License 2.0
154 stars 152 forks source link

Confirm k8s 1.14 allows namespaced CRDs to own global crds #129

Closed rthallisey closed 4 years ago

rthallisey commented 5 years ago

OCP 4.2 will be based on 1.14, so it we won't need https://github.com/kubevirt/hyperconverged-cluster-operator/pull/122 if 1.14 supports this.

rthallisey commented 5 years ago

@lveyde, try and deploy the hco on k8s 1.14, then see if network-addons-operator gets deleted when you remove the HCO cr. Slack comment for more details.

lveyde commented 5 years ago

@rthallisey No, as far as I can tell the operator stays, but I actually expected it to be a normal behavior.

The CR is removed normally though (networkaddonsconfigs cluster).

Noticed something else though - after removing and then re-creating the HCO CR I noticed the following:

kubectl describe networkaddonsconfigs cluster Name: cluster Namespace:
Labels: app=hyperconverged-cluster Annotations: API Version: networkaddonsoperator.network.kubevirt.io/v1alpha1 Kind: NetworkAddonsConfig Metadata: Creation Timestamp: 2019-06-17T08:51:35Z Generation: 1 Owner References: API Version: hco.kubevirt.io/v1alpha1 Block Owner Deletion: true Controller: true Kind: HyperConverged Name: hyperconverged-cluster UID: 1c71811f-90dd-11e9-8848-001a4a160205 Resource Version: 1322571 Self Link: /apis/networkaddonsoperator.network.kubevirt.io/v1alpha1/networkaddonsconfigs/cluster UID: 1c7375a9-90dd-11e9-8848-001a4a160205 Spec: Kube Mac Pool: Linux Bridge: Multus: Status: Conditions: Last Probe Time: 2019-06-17T08:52:11Z Last Transition Time: 2019-06-17T08:51:35Z Message: could not apply (/v1, Kind=ServiceAccount) multus/multus: could not create (/v1, Kind=ServiceAccount) multus/multus: serviceaccounts "multus" is forbidden: unable to create new content in namespace multus because it is being terminated Reason: FailedToApply Status: True Type: Failing Events:

rthallisey commented 5 years ago

@rthallisey No, as far as I can tell the operator stays, but I actually expected it to be a normal behavior.

The normal behavior is that if the HCO cr exists, then all component CRs exist. If the HCO cr doesn't exist, then component crs should not exist.

Looks like the multus namespace was deleted before everything was cleaned up. Is the network-addons-operator responsible for deleting that ns? cc @phoracek

phoracek commented 5 years ago

The owner chain is HyperConverged<-NetworkAddonsConfig<-(all network components). Because of we referenced namespaced HyperConverged from cluster-wide NetworkAddonsConfig, Kubernetes GC immediatelly removed NetworkAddonsConfig. However, network addons operator managed to deploy its resources before removed. Due to that, after HCO recreates NetworkAddonsConfig, it tries to setup its components again, while they are being removed from previous run.

(We have to implement finalizers to block until components are removed.)

rthallisey commented 5 years ago

I think finalizers are part of the problem. I think the other half of this issue is that component operators need to report status on their CRs so the HCO can track their state.

phoracek commented 5 years ago

They are not the cause of the removal, they just cause distraction in logs (terminating ns and all). We do report status on our CR https://github.com/kubevirt/cluster-network-addons-operator/blob/master/pkg/apis/networkaddonsoperator/v1alpha1/networkaddonsconfig_types.go#L37

rthallisey commented 5 years ago

@lveyde can we close this? We're you able to verify?

kubevirt-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kubevirt-bot commented 4 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

kubevirt-bot commented 4 years ago

@kubevirt-bot: Closing this issue.

In response to [this](https://github.com/kubevirt/hyperconverged-cluster-operator/issues/129#issuecomment-586164559): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.