kubernetes-retired / kubefed

Kubernetes Cluster Federation
Apache License 2.0
2.5k stars 531 forks source link

Kubefed support for AWS EKS running Weave #1507

Closed bernielomax closed 1 year ago

bernielomax commented 2 years ago

What would you like to be added:

Update the Helm chart to support installing Kubefed on an AWS EKS cluster that uses a third-party CNI such as Weave.

Why is this needed:

When installing Kubefed 0.9.x on an EKS cluster running Weave. The following error occurs:

Error from server (InternalError): Internal error occurred: failed calling webhook "kubefedconfigs.core.kubefed.io": Post "https://kubefed-admission-webhook.system.svc:443/validate-kubefedconfig?timeout=10s": Address is not allowed

I believe this is caused by the fact that the EKS Kubernetes control plane nodes are totally managed by AWS, and are not capable of running Weave. Therefore the overlay network does not extend to the master nodes, which breaks communication between the control plane and the pods. According to Weave's official "installing on EKS" docs this is a known limitation.

Please note that while pods can connect to the Kubernetes API server for your cluser, API server will not be able to connect to the pods as API server nodes are not connected to Weave Net (they run on network managed by EKS).

I was able to create a workaround for the above limitation by performing the refactors listed below. I am hoping that a similar solution might make its way into the official project.

Note: I have hard-coded certain example values to help demonstrate the workaround. These should actually be set using Helm chart values.

  1. Add the ability to set hostNetwork: true on the following resources:

    • kubefed-controller (Deployment)
    • kubefed-admission-webhook (Deployment)
    • kubefed-xxx (Job)

    Example:

    charts/kubefed/charts/controllermanager/templates/deployments.yaml

          serviceAccountName: kubefed-controller
    +     hostNetwork: true
  2. Avoid Kubefed and Kubernetes port conflicts (i.e 443, 8080) when hostNetwork is enabled. This can be done by being able to specify the ports on the resources above.

    Example:

    charts/kubefed/charts/controllermanager/templates/deployments.yaml

          - command:
            - /hyperfed/controller-manager
            - "--v={{ .Values.controller.logLevel }}"
    +       - "--healthz-addr=:18080"
    +       - "--metrics-addr=:19090"
          command:
          - "/hyperfed/webhook"
          - "--secure-port=18443"
          - "--cert-dir=/var/serving-cert/"
          - "--v={{ .Values.webhook.logLevel }}"
          ports:
    +     - containerPort: 18443    

    charts/kubefed/charts/controllermanager/templates/service.yaml

    spec:
      selector:
        kubefed-admission-webhook: "true"
      ports:
      - port: 443
    +   targetPort: 18443
  3. Disable the webhook metrics, and health endpoints. The controller runtime (sigs.k8s.io/controller-runtime/pkg/manager) automatically binds its own metrics, and health check endpoints to common ports such as 8080. However these endpoints do not seem to be used (since they are not referenced in the Helm chart template for the webhook deployment). I believe they can be safely disabled.

    Example:

    cmd/webhook/app/webhook.go:

        mgr, err := manager.New(config, manager.Options{
            Port:                   port,
            CertDir:                certDir,
    +       MetricsBindAddress:     "0",
    +       HealthProbeBindAddress: "0",
        })

Hopefully folks find this document useful, and an official solution might soon be available. 🤞 I am more than happy to contribute!

/kind feature

bernielomax commented 2 years ago

It's been a while since this was first posted. Any folks have any feedback/opinions on this? I would like to get it merged into the official project to avoid forking.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/kubefed/issues/1507#issuecomment-1426550208): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.