Open mloskot opened 11 months ago
AKS changed the default of max surge in october release so that if you are k8s > 1.28. then max surge is defaulted to 10% (previously it was left blank which implied 1 under the covers)
Release Release 2023-10-01 · Azure/AKS (github.com)
Is there a way we could have done this better for terraform?
Just to add, we've worked around this by explicitly setting max_surge = "10%"
in our terraform files , to avoid the unnecessary changes on plan/apply cycles.
EDIT: Ignore me - rookie error, was working off the wrong duplicated file , this also resolved my issue when I modified the correct file
Just to add, we've worked around this by explicitly setting
max_surge = "10%"
in our terraform files , to avoid the unnecessary changes on plan/apply cycles.
hi @aa2811, do you mind sharing how you did this? I've added this without any luck:
default_node_pool { upgrade_settings { max_surge = "10%" } }
I have been temporarily avoiding this with:
in the resource "azurerm_kubernetes_cluster" I add a line in the lifecycle block(considering you do not change the default node pool like me). eg:
lifecycle {
ignore_changes = [
default_node_pool[0],
]
}
and for the resource "azurerm_kubernetes_cluster_node_pool" I do the same, eg:
lifecycle {
prevent_destroy = false
ignore_changes = [
upgrade_settings,
]
}
lifecycle {
prevent_destroy = false
ignore_changes = [
upgrade_settings,
]
}
this indeed is a workaround to the issue.
What if I want to change this value? I should remove the lifecycle.ignore_changes
metadata, apply the change and then re-apply the ignore_changes
?
Our team decided to always specify the upgrade_settings.max_surge
info (and this avoid useless plan and apply operation) but unfortunately this cannot be applied to Spot
instances nodes as the max_surge
could not be specified, an empty upgrade_settings
block will trigger the initial scenario (where everytime an update is seen) and providing a null-based max_surge
value will trigger an error because max_surge
is required by the provider.
Any suggestion?
What if I want to change this value? I should remove the lifecycle.ignore_changes metadata, apply the change and then re-apply the ignore_changes?
Yes, but if you want to change it, there is no longer a need for the ignore_changes
anyway, since from that moment onward you do define a value for it (instead of the missing 10% where others are suffering from).
Any suggestion?
This issue does not apply to Spot instances since, as you already mentioned, it cannot be specified. So you are probably referring to tf code that tries to create both spot and on-demand node pools within the same logic, for which the solution would be to split it up in separate code for spot and on-demand.
Yes, but if you want to change it, there is no longer a need for the ignore_changes anyway, since from that moment onward you do define a value for it (instead of the missing 10% where others are suffering from).
Yeah, you're right. Sorry but I was in a rush and didn't think properly to the question asked. My bad.
This issue does not apply to Spot instances since, as you already mentioned, it cannot be specified. So you are probably referring to tf code that tries to create both spot and on-demand node pools within the same logic, for which the solution would be to split it up in separate code for spot and on-demand.
Well, kind of. I mean, we have a dedicated module for aks node pool
management but indeed we're executing it twice. So this lead to have 2 different and (already) separate resources to be created. Am I wrong?
I was thinking in adding a dynamic ignore_changes
block for when priority is set to Spot
. This should work, right?
[EDIT] Just found an open discussion and a closed PR (https://github.com/hashicorp/terraform/pull/32608) about my idea of having dynamic lifecycle
but it's not implemented (as it throws Blocks of type "lifecycle" are not expected here.
I think your suggested solution is the only feasible one where we'll create 2 separate modules for Regular
node pools and Spot
node pools.
so I faced the same problem. I am also in the situation of attempting to use the same module for spot and non-spot nodepools.
If anyone reads that, here is a workaround. That config is valid for spot node pools:
upgrade_settings {
max_surge: ""
}
so you can always specify upgrade_settings
even for spot node pool, to work around the issue of this thread.
And you can set max_surge
's value dynamically to workaround that issue: https://github.com/hashicorp/terraform-provider-azurerm/issues/19355
Is there an existing issue for this?
Community Note
Terraform Version
1.6.4
AzureRM Provider Version
3.82.0
Affected Resource(s)/Data Source(s)
azurerm_kubernetes_cluster
Terraform Configuration Files
Debug Output/Panic Output
Expected Behaviour
Subsequent runs of
terraform plan
not reporting any changes toupgrade_settings.max_surge
property.Actual Behaviour
I use
azurerm_resource_group
to create new cluster without specifying customupgrade_settings
indefault_node_pool
and Azure calculates for me default "Maximum surge":Then, every time I run
terraform plan
, still without customisedupgrade_settings
in my `.tf, then the provider always try to modify my cluster withThis is not an expected behaviour, is it?
Steps to Reproduce
terraform plan
terraform apply
terraform plan
terraform plan
Important Factoids
No response
References
No response