kubecost / cluster-turndown

Automated turndown of Kubernetes clusters on specific schedules.
Apache License 2.0
256 stars 24 forks source link

Karpenter Support #65

Open pragmaticivan opened 3 months ago

pragmaticivan commented 3 months ago

Hi, will this project work with Karpenter + EKS managed pools in the same cluster?

teevans commented 3 months ago

Hey there! Yes! This should work just fine.

pragmaticivan commented 3 months ago

This is done by cordoning all nodes in the cluster (other than our new g1-small node), and then reducing the node pool sizes to 0.

Mind sharing how would that work?

I know it would be able to reduce the pool to 0 for the Karpenter controllers but I don't see how that would be able to delete the Karpenter Managed Nodes since they are not managed by EKS.

pragmaticivan commented 3 months ago

Here's what I'm roughly expecting:

Cluster ABC in EKS

1 Node Pool for Karpenter (1 node) 1 Node Pool for CriticalADdons (coreDNS) (1 node) Multiple EC2 images managed by Karpenter (No EKS pools here)

From 8pm-5am I want:

  1. ClusterTurnDown will create a new tiny pool for its controller.
  2. ClusterTurnDown will cordon the 2 EKS pools available
  3. ClusterTurnDown should pause Karpenter controller
  4. ClusterTurnDown should TErminate all nodes managed by Karpenter
  5. ClusterTurnDown should resize the 2 EKs pools to 0
  6. Profit
michaelmdresser commented 3 months ago

I don't believe that Cluster Turndown has been tested with EKS+Karpenter. The general turndown implementation can be found here:

I suspect that Cluster Turndown would end up fighting Karpenter a bit because it currently does not interact with Karpenter, meaning it won't be able to pause the Karpenter controller as you suggest.

vinicius-loureiro-lacerda commented 3 months ago

One thing that I think it is important to mention when using Karpenter + Cluster-Turndown is that the Cluster-Turndown apparently needs to be scheduled on EKS Node Groups and not on Karpenter Node Pools, otherwise it will throw the errors below:

2024-04-09T17:58:47Z INF Determined to be running in a cluster. Using in-cluster K8s config.
2024-04-09T17:58:47Z DBG Recommendation service IsAvailable() GET finished endpoint=http://kubecost-release-cost-analyzer.kubecost:9090/model/savings/requestSizingV2 status=500
2024-04-09T17:58:47Z INF Kubescaler run started
2024-04-09T17:58:47Z INF Starting main loop
2024-04-09T17:58:47Z INF Found ProviderID starting with "aws" and eks nodegroup, using EKS Provider
2024-04-09T17:58:47Z WRN Failed to load valid access key from secret. Err=Failed to locate service account file: /var/keys/service-key.json
2024-04-09T17:58:47Z DBG No workloads have autoscaling enabled. Sleeping for a while.
2024-04-09T17:58:47Z INF Could not find NodeGroup '' in Cluster '<CLUSTER_NAME>' error="InvalidParameter: 1 validation error(s) found.\n- minimum field size of 1, DescribeNodegroupInput.NodegroupName.\n"
2024-04-09T17:58:47Z ERR Failed to initialize cluster provider. Components like Turndown, Continuous Cluster Sizing, and 1-click Cluster Sizing will not initialize. error="initializing cluster data: Failed to locate Clusters which have node groups containing the current instance: <INSTANCE_ID>"
2024-04-09T17:58:47Z DBG Cluster Info service IsAvailable() GET finished endpoint=http://kubecost-release-cost-analyzer.kubecost:9090/model/savings/abandonedWorkloads status=200

You guys can correct me if I'm wrong, but I couldn't manage to make it run on Karpenter-created nodes because Karpenter doesn't create EKS Node Groups.

I can also send the configuration we're using if needed, but at least for now the behavior here was: -> Running on EKS Node Group nodes: ✅ -> Running on Karpenter nodes: ❌

Smana commented 2 weeks ago

@vinicius-loureiro-lacerda Hello, I'm not sure to understand. Does it mean that if the Cluster turndown runs on an EKS node group, it is able to shutdown even karpenter nodes? (e.g. all nodepools...). If so why is this issue still opened :) ?