Open cnmcavoy opened 1 year ago
This issue is currently awaiting triage.
If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
thanks for the issue @cnmcavoy .
AFAIK, we already have AWSMachineList
which could be used for this purpose. Could you list out what this proposal offers in addition to what we have today, if AWSMachineList
doesn't satisfy your requirements?
cc @richardcase @Skarlso @dlipovetsky to get your views on this. Thinking out loud here whether to support such configurations, I think it would complicate things for what we have today for AWSMachines. But please feel free to enlighten me if you all feel otherwise.
AFAIK, we already have
AWSMachineList
which could be used for this purpose. Could you list out what this proposal offers in addition to what we have today, ifAWSMachineList
doesn't satisfy your requirements?
I'd be interested in how I can use that to answer either of these user stories. Are we talking about this resource?
$ kubectl explain awsmachinelist
KIND: AWSMachineList
VERSION: infrastructure.cluster.x-k8s.io/v1beta2
DESCRIPTION:
AWSMachineList is a list of AWSMachine
...
Because the context of the problem being solved in both cases has to do with an autoscaling environment. I'm not sure how an AWSMachineList would work there... Perhaps I need to make this issue's description more clear?
I'm really interested in support these use cases as well. However, To satisfy those stories:
How would scale down work? It would just pick a machine based on the current core strategies random/newest/oldest without accounting for any of the above, which is less than optimal.
Also to enable both stories, specially the second one, I'd expect the controllers to rely and delegate on the aws native resources that support this e.g ASG https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-mixed-instances-groups-instance-weighting.html / https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-mixed-instances-groups-instance-weighting.html Then, to abstract away that aws native behaviour I question MachineDeployment is the best resource since this would break many of the current and historical assumptions, i.e. Machine homogeneity which has consequences as stated above for things like scaling down, autoscaling, MHC... In my mind this use case probably belongs to a different abstraction e.g AWSMachinePool or something else.
How would scale down work? It would just pick a machine based on the current core strategies random/newest/oldest without accounting for any of the above, which is less than optimal.
This is a good question. Scale down logic for picking which Machine to terminate is driven by the MachineSet, which CAPA doesn't have any ownership of. MachineSets have a few possible DeletePolicy
configurations, Random, Newest, and Oldest, which would not be aware of this underlying change and kill Machines regardless of their order in the list of instanceDetails
. I didn't consider this to be an initial blocker because the core problem I was focused on was provisioning failures, but I think this is worth thinking about...
One naive approach would be to take advantage of the existing Machine annotation cluster.x-k8s.io/delete-machine
and set it on the Machines created with a second-order instance type. This would be a simple solution we could do entirely on the CAPA side. MachineSets currently use this annotation mostly for the Cluster Autoscaler to mark downscaling nodes. The drawback of the existing annotation is supersedes the delete policy entirely, causing any Machine with that annotation set to be deleted first in preference to whatever the deletion policy is configured for.
A better approach might be to expose an integer DeletionPriority
field on the instanceDetails
list object, so users could specify the order these machines should delete (and the value would default to the array element index). The AWSMachine controller could annotate the Machine resource with the correct order after the ec2 instance is created. However, this would necessitate changes to CAPI, so that the configured MachineSet DeletionPolicy
is respected, and the order is considered a dimension of that policy.
I'm not sure how necessary making this decision is on a first pass, but I am curious what others think.
Also to enable both stories, specially the second one, I'd expect the controllers to rely and delegate on the aws native resources that support this e.g ASG
ASGs have fundamentally different behavior than MachineDeployments. I took the approach of implementing this because bringing MachinePools up to scratch with MachineDeployments is an even larger amount of work. :)
Also, I'm not sure I think an AWSMachineTemplate need to describe only 1 exact type of ec2 machine. They are certainly implemented that way today, but I see that mostly an artifact of how Cluster Autoscaler limits node group definitions.
MachinePools do support mixed instances in CAPA already, but we lack ways to define lifecycles around the machines they create.
I was focused on was provisioning failures, but I think this is worth thinking about...
If we scope this change to this particular and concise use case i.e. "provisioning failures" I have no objections with the proposed implementation as far as we document explicitly all the limitations e.g. unhandled involuntary interruptions, scale down (fwiw I'd be in favour not perpetuating the delete-machine for new use cases and rather think of a more atomic solution), autoscaling... so the UX impact doesn't come up as unexpected for consumes of machineDeployment resulting on a heterogenous group of Nodes.
For anything more sophisticated like allocation strategies (capacity-optimized, diversified, capacity-optimized-prioritized, lowest-price, price-capacity-optimized...), percentages of On-Demand Instances and Spot Instances... I'd expect us to think and propose a longer term solution where we delegate decisions on the provider native capabilities e.g. ec2 fleet API rather than reimplementing the logic in a kubernetes controller, this might be backed by scalable resource of some sort: an special MachineDeployment, a MachinePool, some kind of karpenter integration or something else...
If we scope this change to this particular and concise use case i.e. "provisioning failures" I have no objections with the proposed implementation as far as we document explicitly all the limitations e.g. unhandled involuntary interruptions, scale down (fwiw I'd be in favour not perpetuating the delete-machine for new use cases and rather think of a more atomic solution), autoscaling... so the UX impact doesn't come up as unexpected for consumes of machineDeployment resulting on a heterogenous group of Nodes.
Sounds good, and this is already inline with the proposed implementation, it only handles capacity provisioning failures, other errors block progress in reconciliation the same way as they do today. I will take some time and update the attached PR to include this documentation.
For anything more sophisticated like allocation strategies (capacity-optimized, diversified, capacity-optimized-prioritized, lowest-price, price-capacity-optimized...), percentages of On-Demand Instances and Spot Instances... I'd expect us to think and propose a longer term solution where we delegate decisions on the provider native capabilities e.g. ec2 fleet API rather than reimplementing the logic in a kubernetes controller, this might be backed by scalable resource of some sort: an special MachineDeployment, a MachinePool, some kind of karpenter integration or something else...
Separately, I'm also interested in this, but I agree it's out of scope.
I consider MachineDeployments to be relatively low-level abstractions.
Thinking out loud, if I need to deploy 5 different instance types, I can create a MachineDeployment for each type. If I need to then scale these, as a group, according to some strategy, I can do build a new controller to do that.
If there are reasons that it is impossible (or very impractical) to use multiple MachineDeployments in this way, I would like to understand them. :pray:
I have a similar view on #3203.
Thinking out loud, if I need to deploy 5 different instance types, I can create a MachineDeployment for each type. If I need to then scale these, as a group, according to some strategy, I can do build a new controller to do that.
We use MachineDeployments in this pattern today, and use the cluster autoscaler as the controller to scale the MachineDeployments. It works, but has serious drawbacks. The Cluster Autoscaler does not get feedback from CAPA that it's scale up can not be met, so this results in the creation of Machines which never advance past "Provisioning". Then we have to wait for the Cluster Autoscaler's max-node-provision time to be exceeded and for it to choose another MachineDeployment to scale. This is a frankly slow process (a pending pod may not be satisfied for an hour in a realistic scenario using the default CA settings), where as it could be nearly-instant if we move the loop into CAPA.
If there are reasons that it is impossible (or very impractical) to use multiple MachineDeployments in this way, I would like to understand them. pray
Check out User Story 2, regarding mixed spot and on-demand instances, how would I solve this with CAPA today with multiple MachineDeployments? I would be interested, I'm not aware of how we can solve this at the present.
If there a larger story around supporting this kind of use-case, can you link the issues for those? I'm not married to this approach, I'm interesting in solving this problem whatever way is the best for CAPA.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
/reopen /remove-lifecycle rotten
@richardcase: Reopened this issue.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/kind feature
Describe the solution you'd like Enhance the AWSMachineTemplate to support multiple instance types, specifically an ordered of instances and spot market options to use when creating the ec2 instance to back an AWSMachine, in order to handle provisioning failures when the desired instance type is unavailable. The AWSMachine should provision the ec2 instance with the type and spot options of the first item of the list, and only walk the list and use the other configurations if there is insufficient capacity to provision.
User stories:
As a user, I want to be able to create a MachineDeployment which can be satisfied with multiple instance types of the same capacity but different families. For example, I want to create a node group for m6a.32xlarge or m6i.32xlarge instances, and I want to try creating m6a instances if they are available, but if they are unavailable, use m6i's.
Also, as a user I want to be able to create a MachineDeployment which can use spot instances, but fall back to on-demand instances when the spot market is empty. For example, I want to create a node group for t3.large, but specify spot options to get a better price, and then fall back to on-demand t3.large's when the spot market is empty.
Anything else you would like to add: This would enable mixed instance capacities, e.g a m6a.2xlarge and m6a.4xlarge in the same MachineDeployment, however this kind of configuration is unsupported by the cluster autoscaler, so we may need to improve documentation to state that mixed capacities are unsupported in CAPI as well.
Environment:
kubectl version
): v1.24/etc/os-release
): Ubuntu 20.04