kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.53k stars 1.3k forks source link

Support exposing scalable resources to Cluster Autoscaler in ClusterClass #5442

Closed enxebre closed 2 years ago

enxebre commented 2 years ago

User Story

As a cluster operator I would like to be able to enable Cluster Autoscaling for some node pools in managed topologies

Detailed Description

ClusterClass provides a way to define a "stamp" to be used for creating many Clusters with a similar shape.

It would be great to have a way to expose scalable resources i.e MachineDeployments to the Cluster Autoscaler in the ClusterClass, so it will be automatically included in the generated Clusters. This has two separate parts:

  1. Expose scalable resources to autoscaling i.e set min/max Cluster Autoscaler labels in MachineDeployments.
  2. Deploy the Cluster Autoscaler, e.g clusterClass.spec.autoscaling: true let us manage a deployment running the Cluster Autoscaler.

/kind feature

enxebre commented 2 years ago

This should have common design considerations with https://github.com/kubernetes-sigs/cluster-api/issues/5125

enxebre commented 2 years ago

/area topology

vincepri commented 2 years ago

/milestone Next /assign @enxebre to talk at community meeting

vincepri commented 2 years ago

/assign @randomvariable

elmiko commented 2 years ago

i like this idea, but considering the MHC cluster class proposal as well it starts to make me wonder if we should have a more generic mechanism for allowing the user to add components that get installed on a per-cluster basis. we have had users ask about this type of feature in the past and i wonder if we would see a way for users to include arbitrary manifests (or references) in their cluster class which could be deployed after the core cluster is running?

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

elmiko commented 2 years ago

i think we should keep this open, but i'm not clear on the next steps. would it be appropriate to draft a proposal for this?

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

elmiko commented 2 years ago

i still think this is a nice idea, not sure how the rest of the community feels. is this worth bringing up again at a meeting or should we consider rolling it into the lifecycle hooks stuff? (cc @enxebre )

/remove-lifecycle rotten

fabriziopandini commented 2 years ago

/lifecycle frozen

sbueringer commented 2 years ago

Regarding the above points:

  1. Is this already supported today as we can set labels on MDs via Cluster.spec.topology...?
  2. I'm not sure if ClusterClass should also be responsible to deploy the ClusterAutoscaler. For my understanding, the Cluster Autoscaler would be deployed once in the mgmt cluster?
elmiko commented 2 years ago

Is this already supported today as we can set labels on MDs via Cluster.spec.topology...?

that's kinda what i was wondering too

i'm not sure if ClusterClass should also be responsible to deploy the ClusterAutoscaler.

agreed, i'm not sure either. i had thought we decided /not/ to include autoscaler while we had the discussion during the clusterclass enhancement process.

i wonder if this issue needs updating to fit with the changes we have proposed more recently?

sbueringer commented 2 years ago

Fabrizio wrote an interesting comment here: https://github.com/kubernetes-sigs/cluster-api/issues/5532#issuecomment-954807126

MaxFedotov commented 2 years ago

Hello! And what is the current state of this issue? The lack of autoscaling abilities seems like a big stopper for migrating to ClusterClass and managed topologies

elmiko commented 2 years ago

@MaxFedotov as you can tell we haven't discussed this in the past month or so. i know the group is split about whether ClusterClass should expose a way to deploy the cluster autoscaler, and we have decided not to include this functionality currently.

with that said, you could add the minimum and maximum scaling annotation to the MachineDeployment or MachineSets defined in the ClusterClass. that would at least give the ability for the autoscaler to detect those scalable resources, you would just be responsible for deploying the autoscaler itself.

elmiko commented 2 years ago

we had a conversation about this on slack, it has some interesting details about problems that users might find.

https://kubernetes.slack.com/archives/C8TSNPY4T/p1658302795387519

MaxFedotov commented 2 years ago

Performed some test using CAPD. If you create a Cluster and specify replicas for controlPlane or machineDeployments

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: capd
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 192.168.0.0/16
    serviceDomain: cluster.local
    services:
      cidrBlocks:
      - 10.128.0.0/12
  topology:
    class: quick-start
    controlPlane:
      metadata: {}
      replicas: 1
    variables:
    - name: imageRepository
      value: k8s.gcr.io
    - name: etcdImageTag
      value: ""
    - name: coreDNSImageTag
      value: ""
    - name: podSecurityStandard
      value:
        audit: restricted
        enabled: true
        enforce: baseline
        warn: restricted
    version: v1.24.0
    workers:
      machineDeployments:
      - class: default-worker
        name: md-0
        replicas: 1

you won't be able to scale them using:

kubectl scale kcp capi-quickstart-75wgz --replicas=3

or

k scale md capi-quickstart-md-0-d498t --replicas=3

They will always go to values specified in Cluster spec and the following error will be in capi-controller-manager logs:

I0720 19:09:21.173605       1 machineset_controller.go:443] "Too many replicas" controller="machineset" controllerGroup="cluster.x-k8s.io" controllerKind="MachineSet" machineSet="default/capi-quickstart-md-0-jl47c-55969c6fd8" namespace="default" name="capi-quickstart-md-0-jl47c-55969c6fd8" reconcileID=06e8abbb-8634-4b00-b780-624c337cf31c need=1 deleting=1
I0720 19:09:21.174082       1 machineset_controller.go:449] "Found delete policy" controller="machineset" controllerGroup="cluster.x-k8s.io" controllerKind="MachineSet" machineSet="default/capi-quickstart-md-0-jl47c-55969c6fd8" namespace="default" name="capi-quickstart-md-0-jl47c-55969c6fd8" reconcileID=06e8abbb-8634-4b00-b780-624c337cf31c delete-policy="Random"
I0720 19:09:21.199668       1 reconcile_state.go:56] "Reconciling state for topology owned objects" controller="topology/cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" cluster="default/capi-quickstart" namespace="default" name="capi-quickstart" reconcileID=49b8a80f-19f1-4aff-a6eb-a6dcd34fe82f
I0720 19:09:21.234217       1 machineset_controller.go:460] "Deleted machine" controller="machineset" controllerGroup="cluster.x-k8s.io" controllerKind="MachineSet" machineSet="default/capi-quickstart-md-0-jl47c-55969c6fd8" namespace="default" name="capi-quickstart-md-0-jl47c-55969c6fd8" reconcileID=06e8abbb-8634-4b00-b780-624c337cf31c machine="capi-quickstart-md-0-jl47c-55969c6fd8-vkmtb"

But if your Cluster spec won't include replicas field for controlPlane or machineDeployments:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: capi-quickstart
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 192.168.0.0/16
    serviceDomain: cluster.local
    services:
      cidrBlocks:
      - 10.128.0.0/12
  topology:
    class: quick-start
    controlPlane:
      metadata: {}
    variables:
    - name: imageRepository
      value: k8s.gcr.io
    - name: etcdImageTag
      value: ""
    - name: coreDNSImageTag
      value: ""
    - name: podSecurityStandard
      value:
        audit: restricted
        enabled: true
        enforce: baseline
        warn: restricted
    version: v1.24.0
    workers:
      machineDeployments:
      - class: default-worker
        name: md-0

Then a KubeadmControlPlane with 1 replica and MachineDeployment with 1 replica will be created. And the replicas numbers won't be managed by the Topology controller so that they can be scaled by user or cluster-autoscaler.

Many thanks to @elmiko and @killianmuldoon for helping to understand how to deal with this issue.

fabriziopandini commented 2 years ago

As per comment above this is already possible, There is also documentation in the autoscaler https://github.com/kubernetes/autoscaler/pull/5053 thanks to @killianmuldoon /close

k8s-ci-robot commented 2 years ago

@fabriziopandini: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/5442#issuecomment-1199803709): >As per comment above this is already possible, >There is also documentation in the autoscaler https://github.com/kubernetes/autoscaler/pull/5053 thanks to @killianmuldoon >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.