kubernetes-sigs / karpenter

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
Apache License 2.0
540 stars 182 forks source link

Capacity Type Distribution #757

Open bwagner5 opened 3 years ago

bwagner5 commented 3 years ago

Some application deployments would like the benefits of using Spot capacity, but would also like somewhat of a stability guarantee for the application. I propose a capacity-type percentage distribution of the k8s Deployment resource. Since Capacity-Type is likely to be implemented at the cloud-provider level, this too would need to be at the cloud-provider layer.

For example:

apiVersion: apps/v1
kind: Deployment
metadata: 
  name: inflate
spec:
  replicas: 10
  template:
    metadata:
      labels:
        app: inflate
        node.k8s.aws/capacity-type-distribution/spot-percentage: 90
    spec:
      containers:
      - image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
        name: inflate
        resources:
          requests:
            cpu: "100m"

The above deployment spec would result in the deployment controller creating Pods for the 10 replicas. Karpenter would register a mutating admission webhook which would check if the pod's deployment spec has these labels and then check any current pods belonging to the deployment to determine which capacity-type label to apply. The pod resource after the admission webhook would look like this:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: inflate
    pod-template-hash: 8567cd588
  name: inflate-8567cd588-bjqzf
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    kind: ReplicaSet
    name: inflate-8567cd588
spec:
  containers:
  - image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
    name: inflate
    resources:
      requests:
        cpu: "100m"
  schedulerName: default-scheduler
  nodeSelector:
      node.k8s.aws/capacity-type: spot

^^ duplicated 8 more times, and then:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: inflate
    pod-template-hash: 4567dc765
  name: inflate-4567dc765-asdf
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    kind: ReplicaSet
    name: inflate-4567dc765
spec:
  containers:
  - image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
    name: inflate
    resources:
      requests:
        cpu: "100m"
  schedulerName: default-scheduler
  nodeSelector:
      node.k8s.aws/capacity-type: on-demand
ellistarn commented 3 years ago

My first thought is that this webhook should be decoupled from Karpenter's core controller. Maybe something plugged into aws cloud provider once we break it apart?

bwagner5 commented 3 years ago

My first thought is that this webhook should be decoupled from Karpenter's core controller. Maybe something plugged into aws cloud provider once we break it apart?

Yeah, I think that makes the most sense.

vinayan3 commented 2 years ago

We have one application that has been testing out using ASGs with mixed purchasing options(on-demand and spot). This one application is using this: https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-mixed-instances-groups.html

In the future it would be great to find a mechanism to migrate this ASG to karpenter.

rothgar commented 2 years ago

Two (maybe random) questions.

Should node.k8s.aws/capacity-type-distribution/spot-percentage be an annotation instead of a label?

Now that we'd have 2 separate deployments how would that work with HPA? How would it know which deployment to scale and keep the total application deployment balanced?

aeciopires commented 2 years ago

+1

rverma-dev commented 2 years ago

We may also think how to leverage the current topology constraints instead of new annotations

ellistarn commented 2 years ago

We've discussed expanding the topologyspreadconstraints concept to include percent based spread. I think this is a perfect fit.

Rokeguilherme commented 2 years ago

+1

rverma-dev commented 2 years ago

Any updates please

himanshurajput32 commented 2 years ago

+1

ellistarn commented 2 years ago

Hey folks, just a reminder to 👍 the original issue, rather than +1 in the comments, since it's easier for us to sort issues by most upvoted.

himanshurajput32 commented 2 years ago

Any update please.

tzneal commented 2 years ago

I've documented another method for achieving something similar at https://karpenter.sh/preview/tasks/scheduling/#on-demandspot-ratio-split that may work for some.

yum-dev commented 2 years ago

👍

andredeo commented 1 year ago

+1

leeloo87 commented 1 year ago

👍

gucarreira commented 1 year ago

@tzneal This is the correct link for on-demand x spot ratio split https://karpenter.sh/preview/concepts/scheduling/#on-demandspot-ratio-split

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

James-Quigley commented 7 months ago

/remove-lifecycle stale

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

James-Quigley commented 4 months ago

/remove-lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sidewinder12s commented 1 month ago

/remove-lifecycle stale

k8s-triage-robot commented 1 week ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

sidewinder12s commented 1 week ago

/remove-lifecycle rotten