Closed bm-skutzke closed 6 months ago
Hello and thank you for raising this! I will take a look.
@bm-skutzke we will speak about the first node pool, because that is the node pool that generated the error.
Initially, you had:
"auto_scaling": {
"min_node_count": 2,
"max_node_count": 10
},
with a node_count
of 2, and
"lifecycle": {
"ignore_changes": [
"node_count"
]
},
in the configuration. The auto-scaling modified the number of nodes, but only for the second node pool, the first node pool still has a node_count
of 2.
Then you said that you changed min_node_count and node_count to be 3 for both node pools
, but changing the node_count
in the configuration doesn't do anything since ignore_changes
directive is used inside the configuration, so only the change for min_node_count
is visible to Terraform.
Terraform will try to change the min_node_count
and it will send the proper API request, which contains only information about min_node_count
(new value is 3) and then the API compares the value of the min_node_count = 3
with the value of node_count = 2
and generates this error.
If you want to enforce the change and modify the node_count
, you need to remove ignore_changes
directive from the configuration from that specific node pool, but you already thought of this, because you wrote:
We cannot omit the lifecycle rule, because this would scale down the second node pool with 4 existing nodes back to 3.
ignore_changes
directive should be present in both node pool resources, removing it from the first node pool resource should have nothing to do with the second node pool resource.
So again, one option would be to remove the ignore_changes
from the first node pool, modify the min_node_count
and node_count
, make the update, and then add ignore_changes
again for future changes made by auto-scaling.
From the Terraform provider perspective, I don't see any bug here since we are only passing the provided data to the API, the node count comparison, error generation, and auto-scaling behavior are all done by the API.
A node_count is always required, but needs to be ignored when using auto-scaling to prevent scaling down.
I agree this may be a little bit confusing, but the node_count
is marked as required in the API, this is why we marked it as required
as well in Terraform.
When changing the min_node_count to a higher value, then probably missing nodes should be created.
When changing the min_node_count to a lower value than existing nodes, then no existing node should be removed.
These notes are related to the auto-scaling feature, but we do not have control over it, we are just passing data to the API.
Description
We are using K8s clusters with 2 node pools, each in a different availability zone. The configuration of both node pools is identical except the
availability_zone
argument of course. We initially configured auto-scaling to be:and set
node_count
to the same value asmin_node_count
.Furthermore, we followed this part of the documentation: --- SNIP --- Be careful when using
auto_scaling
since the number of nodes can change. Because of that, when running terraform plan, Terraform will think that an update is required (sincenode_count
from the tf plan will be different from the number of nodes set by the scheduler). To avoid that, you can use:--- SNAP ---
After a while the number of nodes in both node pools differs due to auto-scaling operations. We still had 2 nodes in the first node pool and 4 nodes in the second node pool.
To reduce this imbalance we changed
min_node_count
andnode_count
to be3
for both node pools. When applying this change we get following error for the first node pool: Error: error creating k8s node pool: node_count cannot be lower than min_node_countIt looks like that the number of existing nodes is taken into account instead the configured
node_count
.We cannot omit the
lifecycle
rule, because this would scale down the second node pool with 4 existing nodes back to 3.Expected behavior
A new node should be created in the first node pool to match the
min_node_count
as configured. The second node pool should be left untouched.Environment
Terraform version:
Provider version:
OS:
Additional Notes
Somehow the configuration is contrary. A
node_count
is always required, but needs to be ignored when using auto-scaling to prevent scaling down. When changing themin_node_count
to a higher value, then probably missing nodes should be created. When changing themin_node_count
to a lower value than existing nodes, then no existing node should be removed. This still allows the auto-scaler to scale down tomin_node_count
in case of unused nodes in a node pool.