karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.45k stars 883 forks source link

cert-manager installation in control-plane can't work on the target-cluster #4416

Open icy opened 10 months ago

icy commented 10 months ago

Please provide an in-depth description of the question you have:

I install cert-manager using karmada control plane. I've noticed that the Role, RoleBinding are not propagated on to the target cluster.

What do you think about this question?:

Environment:

icy commented 10 months ago

Please note that in my test, resource of kinds ClusterRole, ClusterRoleBinding are well propagated onto the target.

icy commented 10 months ago

As a work-around, I select all the Role resources from the original manifest and apply them on the target cluster. This is really confusing situation

$ curl -Lso - https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml \
  | k8s-select.rb kind=Role

k8s-select.rb is my script: https://gist.github.com/icy/228d7ce15b6c1fc66994a490608e6c7c

icy commented 10 months ago

and now, the karmada-controller-manager prints continously a lot of lines as below, I don't know if that's expected, but that looks very annoying to me.

es-dr3c1 Name:cert-manager-webhook-55f4b5b664 UID:9bd56db7-1c3b-400b-86bd-efcd485cff9c APIVersion:work.karmada.io/v1alpha1 ResourceVersion:72271 FieldPath:} reason="ReflectStatusSucceed"
I1213 19:19:35.272275       1 objectwatcher.go:175] Updated resource(kind=ValidatingWebhookConfiguration, /cert-manager-webhook) on cluster: dr3c1
I1213 19:19:35.272296       1 work_status_controller.go:253] reflecting ValidatingWebhookConfiguration(/cert-manager-webhook) status to Work(karmada-es-dr3c1/cert-manager-webhook-7c9bd87fcb)
I1213 19:19:35.272365       1 recorder.go:104] "events: Reflect status for object(ValidatingWebhookConfiguration//cert-manager-webhook) succeed." type="Normal" object={Kind:Work Namespace:karmada-es-dr3c1 Name:cert-manager-webhook-7c9bd87fcb UID:59a3eaf7-104c-4a31-92c5-17102ddd5842 APIVersion:work.karmada.io/v1alpha1 ResourceVersion:72874 FieldPath:} reason="ReflectStatusSucceed"
I1213 19:19:35.278690       1 objectwatcher.go:175] Updated resource(kind=ValidatingWebhookConfiguration, /cert-manager-webhook) on cluster: dr3c1
I1213 19:19:35.278710       1 work_status_controller.go:253] reflecting ValidatingWebhookConfiguration(/cert-manager-webhook) status to Work(karmada-es-dr3c1/cert-manager-webhook-7c9bd87fcb)
I1213 19:19:35.278776       1 recorder.go:104] "events: Reflect status for object(ValidatingWebhookConfiguration//cert-manager-webhook) succeed." type="Normal" object={Kind:Work Namespace:karmada-es-dr3c1 Name:cert-manager-webhook-7c9bd87fcb UID:59a3eaf7-104c-4a31-92c5-17102ddd5842 APIVersion:work.karmada.io/v1alpha1 ResourceVersion:72874 FieldPath:} reason="ReflectStatusSucceed"
I1213 19:19:35.353516       1 objectwatcher.go:175] Updated resource(kind=ValidatingWebhookConfiguration, /cert-manager-webhook) on cluster: dr3c1
I1213 19:19:35.353534       1 work_status_controller.go:253] reflecting ValidatingWebhookConfiguration(/cert-manager-webhook) status to Work(karmada-es-dr3c1/cert-manager-webhook-7c9bd87fcb)
I1213 19:19:35.353589       1 recorder.go:104] "events: Reflect status for object(ValidatingWebhookConfiguration//cert-manager-webhook) succeed." type="Normal" object={Kind:Work Namespace:karmada-es-dr3c1 Name:cert-manager-webhook-7c9bd87fcb UID:59a3eaf7-104c-4a31-92c5-17102ddd5842 APIVersion:work.karmada.io/v1alpha1 ResourceVersion:72874 FieldPath:} reason="ReflectStatusSucceed"
I1213 19:19:35.353627       1 objectwatcher.go:175] Updated resource(kind=MutatingWebhookConfiguration, /cert-manager-webhook) on cluster: dr3c1
I1213 19:19:35.353662       1 work_status_controller.go:253] reflecting MutatingWebhookConfiguration(/cert-manager-webhook) status to Work(karmada-es-dr3c1/cert-manager-webhook-55f4b5b664)
I1213 19:19:35.353730       1 recorder.go:104] "events: Reflect status for object(MutatingWebhookConfiguration//cert-manager-webhook) succeed." type="Normal" object={Kind:Work Namespace:karmada-es-dr3c1 Name:cert-manager-webhook-55f4b5b664 UID:9bd56db7-1c3b-400b-86bd-efcd485cff9c APIVersion:work.karmada.io/v1alpha1 ResourceVersion:72271 FieldPath:} reason="ReflectStatusSucceed"
I1213 19:19:35.357612       1 objectwatcher.go:175] Updated resource(kind=ValidatingWebhookConfiguration, /cert-manager-webhook) on cluster: dr3c1
I1213 19:19:35.357630       1 work_status_controller.go:253] reflecting ValidatingWebhookConfiguration(/cert-manager-webhook) status to Work(karmada-es-dr3c1/cert-manager-webhook-7c9bd87fcb)
I1213 19:19:35.357698       1 recorder.go:104] "events: Reflect status for object(ValidatingWebhookConfiguration//cert-manager-webhook) succeed." type="Normal" object={Kind:Work Namespace:karmada-es-dr3c1 Name:cert-manager-webhook-7c9bd87fcb UID:59a3eaf7-104c-4a31-92c5-17102ddd5842 APIVersion:work.karmada.io/v1alpha1 ResourceVersion:72874 FieldPath:} reason="ReflectStatusSucceed"
icy commented 10 months ago

I now can come to conclusion that cert-manager installation won't work on the target-cluster. The cainjector will fail immediately, mostly because karmada control tries to change the stuff that's supposed to be managed by the cert-manager operator

E1213 19:24:52.194654       1 controller.go:320] "cert-manager: Reconciler error" err="Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"cert-manager-webhook\": the object has been modified; please apply your changes to the latest version and try again" controller="validatingwebhookconfiguration" controllerGroup="admissionregistration.k8s.io" controllerKind="ValidatingWebhookConfiguration" ValidatingWebhookConfiguration="cert-manager-webhook" namespace="" name="cert-manager-webhook" reconcileID=0714c6ac-1c22-4a51-9d6e-1a5f2eaa00af
I1213 19:24:52.195217       1 reconciler.go:142] "cert-manager: Updated object" kind="mutatingwebhookconfiguration" kind="mutatingwebhookconfiguration" name="cert-manager-webhook"
E1213 19:24:52.245425       1 reconciler.go:138] "cert-manager: unable to update target object with new CA data" err="Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"cert-manager-webhook\": the object has been modified; please apply your changes to the latest version and try again" kind="mutatingwebhookconfiguration" kind="mutatingwebhookconfiguration" name="cert-manager-webhook"
RainbowMango commented 10 months ago

I tried to reproduce this on my side. And I found that the Role could be propagated to member cluster(target-cluster). But I didn't propagate the cert-manager workload, instead by a simple PropagationPolicy:

apiVersion: policy.karmada.io/v1alpha1
kind: ClusterPropagationPolicy
metadata:
  name: cluster-propagation-dr3
spec:
  conflictResolution: Abort
  propagateDeps: true
  placement:
    clusterAffinity:
      clusterNames:
      - member1
    replicaScheduling:
      replicaSchedulingType: Duplicated
  resourceSelectors:
  - kind: Role
    apiVersion: rbac.authorization.k8s.io/v1
  - kind: RoleBinding
    apiVersion: rbac.authorization.k8s.io/v1

mostly because karmada control tries to change the stuff that's supposed to be managed by the cert-manager operator

Probably yes. The race would be consistent if cert-manager tries to change the configuration that is managed by Karmada.

RainbowMango commented 10 months ago

So, you are trying to propagate the cert-manager to member clusters, right?

Can you figure out which fields of Role and RoleBinding that the cert-manager tries to make an update?

everpeace commented 10 months ago

The race would be consistent if cert-manager tries to change the configuration that is managed by Karmada.

I'm suffering from this, too. How could we solve this?? I posted the similar thing in slack several days before, by the way.

I now can come to conclusion that cert-manager installation won't work on the target-cluster.

But, I expect that managing cluster addons(cert-managers, etc.) in federated control plane could be popular use cases. So, should karmada support this use cases in some way?? wdyt??


Let me propose to update the issue tile to "cert-manager installation in control-plane can't work on the target-cluster"

everpeace commented 10 months ago

@RainbowMango

Can you figure out which fields of Role and RoleBinding that the cert-manager tries to make an update?

In this case, two components keep trying to update webhook configuration installed by cert-manager.

karmada-agent log:

...
I1213 07:54:45.751551       1 objectwatcher.go:159] Updated resource(kind=MutatingWebhookConfiguration, /cert-manager-webhook) on cluster: karmada-workload-1
I1213 07:54:45.751568       1 work_status_controller.go:237] reflecting MutatingWebhookConfiguration(/cert-manager-webhook) status to Work(karmada-es-karmada-workload-1/cert-manager-webhook-55f4b5b664)
I1213 07:54:45.751575       1 configurable.go:187] Reflect status of object: admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration /cert-manager-webhook with configurable interpreter.
I1213 07:54:45.751581       1 thirdparty.go:173] Reflect status of object: admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration /cert-manager-webhook with thirdparty configurable interpreter.
I1213 07:54:45.751585       1 default.go:128] Reflect status of object: admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration /cert-manager-webhook with build-in interpreter.
I1213 07:54:45.751845       1 default.go:72] Default interpreter is not enabled for kind "admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration" with operation "Retain".
I1213 07:54:45.751871       1 customized.go:69] Hook interpreter is not enabled for kind "admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration" with operation "Retain".
...
I1213 07:54:45.800849       1 objectwatcher.go:159] Updated resource(kind=ValidatingWebhookConfiguration, /cert-manager-webhook) on cluster: karmada-workload-1
I1213 07:54:45.800870       1 work_status_controller.go:237] reflecting ValidatingWebhookConfiguration(/cert-manager-webhook) status to Work(karmada-es-karmada-workload-1/cert-manager-webhook-7c9bd87fcb)
I1213 07:54:45.800876       1 configurable.go:187] Reflect status of object: admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration /cert-manager-webhook with configurable interpreter.
I1213 07:54:45.800883       1 thirdparty.go:173] Reflect status of object: admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration /cert-manager-webhook with thirdparty configurable interpreter.
I1213 07:54:45.800887       1 default.go:128] Reflect status of object: admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration /cert-manager-webhook with build-in interpreter.
I1213 07:54:45.801540       1 default.go:72] Default interpreter is not enabled for kind "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration" with operation "Retain".
I1213 07:54:45.801556       1 customized.go:69] Hook interpreter is not enabled for kind "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration" with operation "Retain".
...

cert-manager ca-injector's log:

...
E1213 07:56:57.056633       1 reconciler.go:138] "cert-manager: unable to update target object with new CA data" err="Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"cert-manager-webhook\": the object has been modified; please apply your changes to the latest version and try again" kind="validatingwebhookconfiguration" kind="validatingwebhookconfiguration" name="cert-manager-webhook"
E1213 07:56:57.056668       1 controller.go:329] "cert-manager: Reconciler error" err="Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"cert-manager-webhook\": the object has been modified; please apply your changes to the latest version and try again" controller="validatingwebhookconfiguration" controllerGroup="admissionregistration.k8s.io" controllerKind="ValidatingWebhookConfiguration" ValidatingWebhookConfiguration="cert-manager-webhook" namespace="" name="cert-manager-webhook" reconcileID="0ef30e2a-c2ff-494f-bd3d-976504283b2c"
E1213 07:56:57.056798       1 reconciler.go:138] "cert-manager: unable to update target object with new CA data" err="Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"cert-manager-webhook\": the object has been modified; please apply your changes to the latest version and try again" kind="mutatingwebhookconfiguration" kind="mutatingwebhookconfiguration" name="cert-manager-webhook"
E1213 07:56:57.056812       1 controller.go:329] "cert-manager: Reconciler error" err="Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"cert-manager-webhook\": the object has been modified; please apply your changes to the latest version and try again" controller="mutatingwebhookconfiguration" controllerGroup="admissionregistration.k8s.io" controllerKind="MutatingWebhookConfiguration" MutatingWebhookConfiguration="cert-manager-webhook" namespace="" name="cert-manager-webhook" reconcileID="263f82b7-4b73-4661-bdc9-a75af3e170af"
...
icy commented 10 months ago

Hi @everpeace ,

Thanks for suggestion, I've updated my ticket title. I have two issues (Role/RoleBinding was not propagated ; the other is the same as yours). For the propagation issue because @RainbowMango can't reproduce it, I will start with a different topic.

everpeace commented 10 months ago

I will start with a different topic.

I'm sorry to cut in the issue 🙇. I appreciate your kindness 🙇.

RainbowMango commented 10 months ago

But, I expect that managing cluster addons(cert-managers, etc.) in federated control plane could be popular use cases. So, should karmada support this use cases in some way?? wdyt??

100% Agree! Take the cert-manager as an example, this kind of addon contains lots of manifests, and might not be suitable to propagate by a PropagationPolicy as it forces users to figure out each manifest and then put them in.

Here is another alternative that might be a relief for this case, instead of propagating each manifest, take a helm chart as a whole. Please refer to https://karmada.io/docs/userguide/cicd/working-with-flux, and looking forward to your feedback. This approach relies on fluxcd running on each member cluster.

In addition, the community also hopes to provide a more convenient way to help users manage add-ons, but no ideal solution yet.

everpeace commented 10 months ago

Here is another alternative that might be a relief for this case, instead of propagating each manifest, take a helm chart as a whole. Please refer to https://karmada.io/docs/userguide/cicd/working-with-flux, and looking forward to your feedback. This approach relies on fluxcd running on each member cluster.

Thanks!! Installing addons via flux like operator would be good idea. This can make the installation happens in target clusters only.

I found one toil remains in this method, by the way. Most addons provide CRDs(Issuer/Certificate etc. in cert-manager case). We need to install their CRDs in control plane and maintain its versions. It is because applications to be propagated often include such custom resources provided by addons.

In addition, the community also hopes to provide a more convenient way to help users manage add-ons, but no ideal solution yet.

Cool. I will also share some idea if I got some idea for how to do this in better way.

Im not familiar with open cluster management, it seems they have add-on controller: https://open-cluster-management.io/concepts/addon/

Karmada might follow the similar idea!?

icy commented 10 months ago

Hi @RainbowMango ,

Here is another alternative that might be a relief for this case, instead of propagating each manifest, take a helm chart as a whole. Please refer to https://karmada.io/docs/userguide/cicd/working-with-flux, and looking forward to your feedback. This approach relies on fluxcd running on each member cluster.

It sounds like fluxcd, argocd and karmada share the same idea , to propagate resources. The is the field where all these tools may have some overlapped work.

To be fair, I can solely solve some cluster replication issues with argocd/fluxcd. But, it'd not be easy to patch /hook up the process. For example, if you want an application to be delivered differently on different target clusters, you actually have to write some additional supports for fluxcd/argocd. This is a limitation of those CD tools, I don't know if there is any plugin/hook can support that (FIXME), but I think karmada is more powerful, more naturally.

Moreover, I tend to avoid fluxcd/argocd in some situation. Or sometimes you have to avoid them. For example, installation of argocd operator actually requires you to install cert-manager beforehand (see https://argocd-operator.readthedocs.io/en/latest/install/manual/#enable-webhook-support).

Patrick0308 commented 3 months ago

In this case, two components keep trying to update webhook configuration installed by cert-manager.

This ResourceInterpreterCustomization fix the issue.

apiVersion: config.karmada.io/v1alpha1
kind: ResourceInterpreterCustomization
metadata:
  name: mutatingwebhookconfiguration
spec:
  target:
    apiVersion: admissionregistration.k8s.io/v1
    kind: MutatingWebhookConfiguration
  customizations:
    retention:
      luaScript: >
        function Retain(desiredObj, observedObj)
          local desiredLength = #desiredObj.webhooks
          local observedLength = #observedObj.webhooks
          if desiredLength <= 0 or observedLength <= 0 then
            return desiredObj
          end
          for i = 1, math.min(desiredLength, observedLength) do
             if desiredObj.webhooks[i].clientConfig.caBundle == nil then
               desiredObj.webhooks[i].clientConfig.caBundle = observedObj.webhooks[i].clientConfig.caBundle
             end
          end
          return desiredObj
        end
RainbowMango commented 3 months ago

Thanks @Patrick0308 for the infomation.

This is the recommanded way to resolve the conflict that both Karmada and controllers running on member cluster tries to make update a resource.

But, it's worth noting that once you decalred a retention by ResourceInterpreterCustomization on MutatingWebhookConfiguration , that means all MutatingWebhookConfiguration resources will follow this rule specified per the luaScript.