kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.11k stars 3.98k forks source link

cluster-autoscaler : KubeSchedulerConfiguration plugin configuration PodTopologySpread #3879

Open Ramyak opened 3 years ago

Ramyak commented 3 years ago

Which component are you using?:

component: cluster-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

Scheduler now supports PodTopologySpread - Cluster-level default constraints from kubernetes release 1.18 - commit.

1. PodTopologySpread - defaultConstraints at the cluster level: Cluster-autoscaler does not consider PodTopologySpread defaultConstraints at the cluster level.

apiVersion: kubescheduler.config.k8s.io/v1alpha2
kind: KubeSchedulerConfiguration
leaderElection:
  leaderElect: true
profiles:
  - pluginConfig:
      - name: PodTopologySpread
        args:
          defaultConstraints:
            - maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: ScheduleAnyway
          defaultingType: List

Pods remain unscheduled. You get the error. Note: Pod specs do not have topologySpreadConstraints in this case.

I0210 18:03:11.409934       1 filter_out_schedulable.go:118] Pod test-app-5b75d455c9-7gpf5 marked as unschedulable can be scheduled on node ip-172-21-145-192.ec2.internal (based on hinting). Ignoring in scale up.

2. PodTopologySpread when set at deployment: works since pod spec starts having topologySpreadConstraints.

  topologySpreadConstraints:
  - labelSelector:
      matchLabels:
        app: some-app
        release-unixtime: "1611668612"
    maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule

Describe the solution you'd like.:

Cluster-autoscaler consider PodTopologySpread - Cluster-level default constraints during attempts to schedule pods

Describe any alternative solutions you've considered.:

Additional context.:

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

Ramyak commented 3 years ago

/remove-lifecycle stale

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Ramyak commented 3 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

der-eismann commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

lawliet89 commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

der-eismann commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rohitagarwal003 commented 1 year ago

/remove-lifecycle stale /lifecycle frozen

vadasambar commented 1 year ago

This is expected behavior. ScheduleAnyway constraint is processed during Scoring (Priority Function) part of the scheduling process in the scheduler. It is not processed during Filtering (Predicate) part of the scheduling process in the scheduler.

CA only uses Filtering part in the simulations (PreFilter and Filter extension points to be precise)


ScheduleAnyway is a part of scoring phase of the PodTopologySpread plugin

While DoNotSchedule is a part of the filter/predicate phase of the PodTopologySpread plugin


As long as DoNotSchedule is used, CA should respect the constraint. One problem I see with the current implementation in the CA is, we do not support a custom default constraint. We use the default one. If you specify a DoNotSchedule custom default constraint, CA might not respect it.

jdomag commented 1 year ago

@vadasambar You said that CA respects default cluster constrain, but also that CA doesn't support ScheduleAnyway. But the default cluster constrain is ScheduleAnyway according to the docs:

defaultConstraints:
  - maxSkew: 3
    topologyKey: "kubernetes.io/hostname"
    whenUnsatisfiable: ScheduleAnyway
  - maxSkew: 5
    topologyKey: "topology.kubernetes.io/zone"
    whenUnsatisfiable: ScheduleAnyway

Can you elaborate on this one, please?

vadasambar commented 1 year ago

@jdomag I recently wrote a blogpost on this (maybe this should be part of the docs) which might answer your question. Quoting the relevant part here:

CA imports the PreFilter and Filter part of the default scheduler code i.e., it doesn’t allow making any changes to the default behavior. Because of this CA’s simulation of the scheduler won’t accurately reflect the actual scheduler running in your cluster since your cluster/control plane scheduler’s behavior would be different than CA’s simulated scheduler. This would create problems because CA’s autoscaling won’t accurately match the needs of your cluster. ...

CA doesn’t consider preferredDuringSchedulingIgnoredDuringExecution because it is a part of Scoring phase of NodeAffinity scheduler plugin (comes in-built). Every scheduler plugin can act on multiple extension points. NodeAffinity acts on extension points in both Filtering and Scoring phases. The only problem is, it considers preferredDuringSchedulingIgnoredDuringExecution only in Scoring phase (PreSCore and Score extension points to be precise) and not in Filtering phase. ...

Similarly, ScheduleAnyway is a part of scoring phase of the PodTopologySpread plugin

https://vadasambar.com/post/kubernetes/would-ca-consider-my-soft-constraints/

jdomag commented 1 year ago

@vadasambar thanks, this is a great article, I wish it was part of the official docs :)

vadasambar commented 1 year ago

thanks, this is a great article, I wish it was part of the official docs :)

I will try proposing adding it to the docs in the upcoming SIG (and thank you :))

jan-kantert commented 5 days ago

This hit us by surprise as well. In my opinion there should be a big red warning in the kubernetes docs: https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/#cluster-level-default-constraints. Currently, this looks like a stable feature but it can cripple your application if you are unlucky. We added a PR to warn future users.