This project contains Ansible code that creates a baseline in an existing Kubernetes environment for use with the SAS Viya Platform, generates the manifest for an order, and then can also deploy that order into the Kubernetes environment specified.
Apache License 2.0
71
stars
64
forks
source link
feat: (IAC-903) Update Cluster Autoscaler for EKS 1.25 Support #399
Since the 2023.03 cadence adds support for K8s 1.25, we want to add support for that version across all the IAC projects. The usual process for this is just bumping the kubernetes_version & kubectl values so that we are in the +1/-1 range of all the versions the latest cadence supports.
Upon using viya4-deployment to baseline a 1.25 EKS cluster we ran into an issue installing the cluster-autoscaler, we needed to update it to a version that supports PodDisruptionBudget policy/v1 since policy/v1beta1 was deprecated as of 1.25. The new default CLUSTER_AUTOSCALER_CHART_VERSION we use for 1.25+ clusters is version 9.25.0. With the update to the autoscaler version we also needed to make changes on the viya4-iac-aws side to support this new version, updates were made to the cluster-autoscaler Role policy to line up with the recommendations from the kubernetes/autoscaler documentation so that it can properly function.
Note: for EKS clusters <1.25 we still use the CLUSTER_AUTOSCALER_CHART_VERSION default value of 9.9.2
In case a user uses version 5.5.0 of viya4-iac-aws or earlier to create their K8s 1.25 infrastructure in AWS (which would not include the Role policy updates), the troubleshooting documentation has steps remediate this by either using the latest version of viya4-iac-aws to update the Role policy or manual steps the user can take in the AWS IAM Console to achieve the same.
Tests
Tested the following scenarios, more details in internal ticket.
Note: for all these scenarios I set V4_CFG_CAS_WORKER_COUNT: 3 in my ansible vars to ensure that the autoscaler functioned and provisioned the additional CAS nodes. I also ensured upon uninstall the unused nodes were automatically removed.
Changes
Since the 2023.03 cadence adds support for K8s 1.25, we want to add support for that version across all the IAC projects. The usual process for this is just bumping the kubernetes_version & kubectl values so that we are in the +1/-1 range of all the versions the latest cadence supports.
Upon using viya4-deployment to baseline a 1.25 EKS cluster we ran into an issue installing the cluster-autoscaler, we needed to update it to a version that supports
PodDisruptionBudget policy/v1
sincepolicy/v1beta1
was deprecated as of 1.25. The new defaultCLUSTER_AUTOSCALER_CHART_VERSION
we use for 1.25+ clusters is version 9.25.0. With the update to the autoscaler version we also needed to make changes on the viya4-iac-aws side to support this new version, updates were made to the cluster-autoscaler Role policy to line up with the recommendations from the kubernetes/autoscaler documentation so that it can properly function.You can see the policy update in this PR https://github.com/sassoftware/viya4-iac-aws/pull/189 and the changes will be officially released as part of viya4-iac-aws:5.6.0
Note: for EKS clusters <1.25 we still use the
CLUSTER_AUTOSCALER_CHART_VERSION
default value of 9.9.2In case a user uses version 5.5.0 of viya4-iac-aws or earlier to create their K8s 1.25 infrastructure in AWS (which would not include the Role policy updates), the troubleshooting documentation has steps remediate this by either using the latest version of viya4-iac-aws to update the Role policy or manual steps the user can take in the AWS IAM Console to achieve the same.
Tests
Tested the following scenarios, more details in internal ticket.
Note: for all these scenarios I set
V4_CFG_CAS_WORKER_COUNT: 3
in my ansible vars to ensure that the autoscaler functioned and provisioned the additional CAS nodes. I also ensured upon uninstall the unused nodes were automatically removed.