Open LucaCinquini opened 6 months ago
Also needed to support the On Demand task
From Drew on Slack: "I’ve made progress on integrating Karpenter into SPS for node autoscaling. However, I’ve hit an IAM blocker involving the CreateFleet operation. It appears the CreateFleet operation is not allowed for the mcp-tenantOperator-AMI-APIG permissions boundary which Karpenter uses for its KarpenterController IAM role. I’ve manually tried the CreateFleet operation using the AWS CLI and was able to replicate the permission denied issue while assuming the mcp-tenantOperator role."
After this discussion Mike filed a ticket with MCP support to update the permissions of the mcp-tenantROperator role.
Update: Still dealing with MCP permissions issues (surprise, surprise!)
Drew worked with MCP to solve all the IAM permissions problems. The latest version works for autoscaling nodes. Verified following Drew's instructions on Slack:
To test the autoscaling you can do the following:
Scale up a dummy demo deployment named “inflate”: kubectl scale deployment inflate --replicas 10
Note the double "//" in this line:
source = "git@github.com:unity-sds/unity-cs-infra.git//terraform-unity-eks_module?ref=u-sps-24.1-beta.01"
Use Karpenter for autoscaling of nodes
Acceptance Criteria: o Demonstrated autoscaing of k8s nodes (and pods) when a large number of jobs is submitted, and scaling back down to 0 nodes when all josb are executed o CI/CD pipeline for nightly test of autoscaling up and down