Closed Skarlso closed 3 months ago
Hey there! This hasn't been worked on yet, but many folks have expressed interest. The high level so far for a CAPI provider for Karpenter has been:
Thanks for all the info @ellistarn! I'll keep an eye open on this and try to achieve something in the meantime with what we have. :)
Hello! is there any effort ongoing to find the best way to integrate karpenter with CAPA as of right now?
I have been able to make karpenter boot nodes and join the cluster, the part that is currently missing is registering the Machines somehow in the Management Cluster and handle the upgrades (currently the only way I found was to set a low TTL node that will eventually roll the whole cluster).
My initial thoughts on integrating into Cluster API (CAPI) are focused on how we deal w/ CAPI's idea of machines being replicas (MachineDeployment resource is CAPI's analogy to k8s's Deployment resource). The canonical use-case for CAPI is you define a MachineDeployment and then scale it out or in via the replicas
field. The spec for a Machine doesn't strictly forbid the notion of heterogenous VM offerings/SKUS:
Where things get interesting is in the actual provider implementation of a machine (note the InfrastructureRef
field in CAPI's MachineSpec
, which will reference a provider-specific machine implementation "template" or "recipe". Here's what AWS's CAPI provider (CAPA) looks like:
Note especially above the InstanceType
field, which in its current design is meant to be a common property replicated equivalently across all AWSMachine
resources in a CAPA cluster. I.e., if the value is m4.xlarge
then that means that all of the AWSMachines
(which ultimately underlie Kubernetes nodes) during a scale out event (increase the replicas
field of the parent CAPI Machine
resource) will be running on m4.xlarge
instances.
So, what I think this means in terms of the optimal integration point: we can probably try as a first pass implementing an additional AWSKarpenterMachine
).
I think something like the above could work to best leverage the existing CAPI ecosystem and minimize the amount of net new effort to create this new solution from scratch.
cc'ing CAPI project maintainers @fabriziopandini @vincepri @sbueringer @CecileRobertMichon to get their thoughts on this, I know it's a mouthful!
So, what I think this means in terms of the optimal integration point: we can probably try as a first pass implementing an additional Machine spec for each CAPI provider that wishes to implement a karpenter provisioner, in the existing provider project (for example, CAPA, CAPZ, CAPG). And that new spec would not include a "VM type" as a 1st class, source-of-truth, declarative config, but would rather include the necessary configuration inputs (VM types, pricing models, spot configuartion, etc) for karpenter to create new nodes when the replica count increases. The actual VM type chosen could then be "demoted" to a status field in the resultant, new spec (let's call it AWSKarpenterMachine).
I like where this is going. I wonder if we could instead of introducing an additional spec, make the VM size optional in the providers (that would be breaking but maybe v1beta2?) and then define the "necessary configuration inputs" for the node in the CAPI Machine itself, i.e. I need this much mem/cpu, max price, optimized to run this type of workload etc. (also all optional). Potentially the VM size could still exist as an explicit override that takes precedence.
Also tagging @elmiko for thoughts
i like what @jackfrancis is thinking, i also think there will be some cool interactions between how Karpenter works and the CAPI MachinePool type. i'm not sure how the instance sizes will get communicated but it seems like a natural fit to me.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
we are working towards a cluster api enhancment (CAEP) from the capi karpenter feature group, perhaps we should update the doc to point users to the feature group's work?
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
i'm happy to allow this to close or keep it open for tracking, whatever folks prefer.
for anyone who is curious about the karpenter provider cluster-api, please come visit the cluster api karpenter feature group.
i'm happy to allow this to close or keep it open for tracking, whatever folks prefer.
for anyone who is curious about the karpenter provider cluster-api, please come visit the cluster api karpenter feature group.
I am super interested to know where this goes, as it would be a blocker for folks who have already adopted Karpenter on AWS and now also on Azure to switch to CAPA/CAPI for cluster management.
for now, the best way to follow the progress is to attend our feature group meetings, or review the agenda. i try to keep notes there, and we do record the meeting.
Hello! 👋
Getting right to it there is this doc: https://github.com/aws/karpenter/blob/main/designs/aws-launch-templates-options.md#capi-integration
This says that CAPI integration is discussed in a different doc. Can someone please point me to that doc? :) That would be awesome.
I'm trying to get integration with CAPA started with Karpenter and I was wondering what elements/objects/components CAPA could/should/shouldn't manage with Karpenter. Any help is much appreciated. Cheers!