Closed mway-niels closed 1 year ago
@mway-niels I understand the proposal, but I don't understand the reasons behind it, can you please clarify some things for me, please?
We recently upgraded the Kubernetes version from 1.23.9 to 1.24.9. The update itself completed without any issues.
I assume that the upgrade was done by modifying the k8s version in the tf
plan, please confirm or correct me if I'm wrong.
as part of the update procedure IONOS downscaled our node pools to the value defined in node_count, which is the same as auto_scaling.min_node_count in our configuration.
From what you wrote here, I understand that the final number of nodes (after the upgrade) was node_count
from the tf
plan, but then, you wrote this:
During an update, the node groups should keep the current node_count as their desired node_count and should not be forced to scale to auto_scaling.min_node_count nodes.
From this it seems that, after the update, the number of nodes was auto_scaling.min_node_count
. It's a bit confusing, I don't understand what number of nodes you had in the end, after the update.
I'm not sure I understood what the problem is, but, from what you wrote, I think it's the following:
After the upgrade, the number of nodes wasn't node_count
, as you expected, but auto_scaling.min_node_count
, which would also motivate what you wrote here: "This caused downtime for the workloads running in the cluster as they couldn't be scheduled due to missing resources (nodes)."
.
Please tell me if I understood correctly or not. If not, please try to clarify and maybe add other additional details such as the expected number of nodes after the upgrade or anything else that may be useful, maybe something like this for example:
Configuration:
auto_scaling {
min_node_count = 1
max_node_count = 4
}
node_count = 2
Problem
After the upgrade, the node_count
was 1, which was unexpected.
Expected behavior
After the upgrade, the number of nodes should be node_count = 2
, not auto_scaling.min_node_count
.
The explanations that contain the configurations as well as the values for the number of nodes you expected vs the real values will help me better understand what the problem is.
Apologies, let me clarify:
Our configuration (shortened for readability):
resource "ionoscloud_k8s_cluster" "example_k8s" {
k8s_version = "1.24.9" # Changed from 1.23.9
maintenance_window {
...
}
}
resource "ionoscloud_k8s_node_pool" "example_k8s_nodepool" {
k8s_cluster_id = ionoscloud_k8s_cluster.example_k8s.id
maintenance_window {
...
}
datacenter_id = var.ionoscloud_datacenter_id
lans {
...
}
k8s_version = ionoscloud_k8s_cluster.example_k8s.k8s_version # Changed from 1.23.9
node_count = var.min_node_count # = 3
auto_scaling {
min_node_count = var.min_node_count # = 3
max_node_count = var.max_node_count # = 7
}
}
Timeline:
k8s_version
from 1.23.9
to 1.24.9
.terraform plan
and terraform apply
:
# module.k8s_production[0].ionoscloud_k8s_cluster.example_k8s will be updated in-place
~ resource "ionoscloud_k8s_cluster" "example_k8s" {
id = "XXXXX"
~ k8s_version = "1.23.9" -> "1.24.9"
# (1 unchanged attribute hidden)
# (1 unchanged block hidden)
}
# module.k8s_production[0].ionoscloud_k8s_node_pool.example_k8s_nodepool[0] will be updated in-place
~ resource "ionoscloud_k8s_node_pool" "example_k8s_nodepool" {
id = "XXXXX"
~ k8s_version = "1.23.9" -> "1.24.9"
~ node_count = 4 -> 3
# (11 unchanged attributes hidden)
# (4 unchanged blocks hidden)
}
# module.k8s_production[0].ionoscloud_k8s_node_pool.example_k8s_nodepool[1] will be updated in-place
~ resource "ionoscloud_k8s_node_pool" "example_k8s_nodepool" {
id = "XXXXX"
~ k8s_version = "1.23.9" -> "1.24.9"
~ node_count = 5 -> 3
# (11 unchanged attributes hidden)
# (4 unchanged blocks hidden)
}
terraform apply
the node pools are scaled down accordingly.@mway-niels thank you! Now, it is clear, the following things happen:
node_count = 3
since this is the initial value from the tf
plan.tf
plan and saved in the state file is 3, so Terraform
will think of this as a change that should be made. If, at this moment, we run terraform plan
we will see something like:
resource "ionoscloud_k8s_node_pool" "example" {
id = "952becd7-57f5-4133-864b-22cb17fd06c9"
name = "k8sNodePoolExample"
~ node_count = 4 # VALUE FROM THE API -> 3 # VALUE FROM TF PLAN
We have 2 solutions to avoid this:
Use:
lifecycle {
ignore_changes = [
node_count
]
}
Inside the ionoscloud_k8s_node_pool
resource. As the name says, this will ignore the changes for node_count
, more details here. Keep in mind that, if you choose this solution, you will need to remove ignore_changes = [node_count]
if you want to update the node_count
value from the tf
plan.
In the tf
plan, when you modify the version, you can also modify the node_count
value to match the new one, the one set by the scheduler, 4, in our case.
I opted to use proposed solution 1 as manual node_count changes are unlikely since we're using an autoscaling configuration.
Current Provider Version
Use-cases
We're running a Managed Kubernetes cluster on IONOS cloud. We recently upgraded the Kubernetes version from 1.23.9 to 1.24.9. The update itself completed without any issues.
However, as part of the update procedure IONOS downscaled our node pools to the value defined in
node_count
, which is the same asauto_scaling.min_node_count
in our configuration. This caused downtime for the workloads running in the cluster as they couldn't be scheduled due to missing resources (nodes). Removingnode_count
isn't possible as it is a required variable.Proposal
Update the Terraform provider implementation to make
node_count
optional ifauto_scaling
is configured. During an update, the node groups should keep the currentnode_count
as their desirednode_count
and should not be forced to scale toauto_scaling.min_node_count
nodes.