Open koushikgongireddy opened 1 month ago
@apeabody @aaron-lane @bharathkkb Can you please check once on this issue and let me know how to overcome?
I see issue was opened and closed without resolution previously - https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/issues/1773
Sorry, I have not maintained these modules for years.
Hi @koushikgongireddy
Any change to the keepers will normally result in nodepool replacement. To avoid this, in some cases, it may be possible to edit your remote state with the new keeper values. Here is an example of updating a keeper value in the v24.0 upgrade guide: https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/master/docs/upgrading_to_v24.0.md#update-variant-random-id-keepers-updated
v17 to v33 is a substantial jump, so be sure to review:
Thanks for the update @apeabody
Yeah we reviewed the changes from 17.0.0 to 30.0.0 and its not affecting our workloads too much so we are good!!
So for keepers is there any alternative other than updating the statefile? Because if its one off we can do it but if its coming up from version to version then its tedious task if we are managing 10's of clusters. Currently we have 25+ GKE clusters!
So want to check if there is any other solution than updating the state file!!
Thanks Koushik
Thanks for the update @apeabody
Yeah we reviewed the changes from 17.0.0 to 30.0.0 and its not affecting our workloads too much so we are good!!
So for keepers is there any alternative other than updating the statefile? Because if its one off we can do it but if its coming up from version to version then its tedious task if we are managing 10's of clusters. Currently we have 25+ GKE clusters!
So want to check if there is any other solution than updating the state file!!
Thanks Koushik
Updating the state file should only be required when keepers are modified AND you want to avoid replacing nodepools. While keepers don't change in every major release, there have been a number of changes since v17 was released 3 years ago.
@apeabody - But i see the changes from 30.0.0 to 32.0.0. I added the changes in above description as well!!
So we thought of changing to 32.0.0 from 30.0.0 in one dev cluster and then we again seeing keepers changes, so if its repeats its a pain point for us who is managing 25+ clusters.
enable_confidential_storage
Hi @koushikgongireddy - Curious, is there a reason the nodepool can't be recreated, especially on a dev cluster?
Part of the challenge is these new nodepool arguments can force re-creation at the provider level (For example enable_confidential_storage
has ForceNew: true
https://github.com/hashicorp/terraform-provider-google/blob/main/google/services/container/node_config.go#L753).
@apeabody - We are fine on Dev Clusters, but for PROD clusters its a downtime because new node pool will be created and old will be deleted right away. Which will cause downtime for us.
So updating the statefile is an option but doing for 20 PROD clusters is difficult for us, so we want to see if there is any other alternative option because we are using external module provided by GCP
Also experiencing the issue. I'm upgrading from module v32.0.0 to v33.0.4. From the TF plan I can see the following change on the node pools. It wants to add the gcfs_config even when it's set to false:
+ gcfs_config {
+ enabled : false
}
This is the only change on a node pool level which makes me think this is what causes the nodes to recreate?
@KRASSUSS w/r/t the gcs_config
diff specifically, I think updating to >= 6.4.0 or 5.44.1 provider version should resolve it as long as you're also on the latest module version. If you're seeing that still with the latest provider and latest module version, maybe file a separate issue, but I think you should be good.
If it's a provider version where it's showing that diff but is not forcing recreation on a node pool, it should also be safe to apply, but you'll probably see a permadiff until the provider is upgraded.
@koushikgongireddy you could maybe try removing the item from state (after backing up state, of course) and see if reimporting the cluster and nodepools works in your lower envs? Or make sure there aren't any config changes you have to add to the settings (e.g., formerly unsupported values that are now supported). I'm not super familiar with keepers, but it's possible that if you match the existing values in the configuration properly, you won't see a diff?
For example, the labels going from set to null makes me think there are some values that might need to be reflected in your Terraform configs?
Part of the challenge is these new nodepool arguments can force re-creation at the provider level (For example enable_confidential_storage has ForceNew: true
I'm hoping to eventually get more of the items that either don't support in-place updates, or don't work with updates at all, fixed, at least for the default nodepool case. I would imagine that enable_confidential_storage
does need to recreate the nodepool if it is actually changing settings, but if the OP doesn't have it enabled, and doesn't want to enable it, maybe they can look at explicitly setting the value to false
or null
?
@wyardley The issue is for sure keepers here.
Because if you see the above description keepers is the major change in my tf plan and that's causing the random_id to get change and random_id change is causing node pool name to change and id node pool name is changing it needed Node pool to get recreated!
The issue is not with enable_confidential_storage has ForceNew: true
If you see below lines the keepers are changing and that change is ultimately causing force node pool recreation
17.0.0 - https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/v17.0.0/modules/beta-private-cluster-update-variant/cluster.tf#L313 30.0.0 - https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/v30.0.0/modules/beta-private-cluster-update-variant/cluster.tf#L500
We tried updating state file with the values and its working fine but we want to know is it the only option or any alternative options to avoid keepers change!!
Only the *-update-variant
modules includes keepers, specifically for the node pool create before destroy behavior: https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/tree/master/modules/beta-public-cluster-update-variant#node-pool-update-variant.
For those not making use of the node pool create before destroy behavior, the other modules such as beta-private-cluster
do not include the keepers and would be suggested. Otherwise to avoid the node_pool replacement with *-update-variant
modules, when new keepers are introduced, the local state could be modified.
@apeabody Thanks for the update, we will definitely test with beta-private-cluster and update you!!
Also can you describe more on the difference between the modules we have? I mean what benefits we get with update-variant and not with private-cluster?
beta-autopilot-private-cluster beta-private-cluster-update-variant beta-private-cluster
Thanks Koushik
@apeabody Thanks for the update, we will definitely test with beta-private-cluster and update you!!
Also can you describe more on the difference between the modules we have? I mean what benefits we get with update-variant and not with private-cluster?
beta-autopilot-private-cluster beta-private-cluster-update-variant beta-private-cluster
Thanks Koushik
Hi @koushikgongireddy
All three of these are similar in that they enable beta
features and can be configured as private clusters.
The big differences are that: beta-autopilot-private-cluster: create an Autopilot cluster beta-private-cluster-update-variant: creates a Standard cluster, with Node Pool Update Variant beta-private-cluster: creates a Standard cluster
TL;DR
Upgrading from one Module version to Other Module Version causing Node pool to recreate
Expected behavior
When we upgrade GKE module versions we are seeing breaking changes where GKE Node pools are trying to recreate.
Currently we are on old version 17.0.0 and planning to upgrade to 30.0.0 and i see there are changers in keepers which is causing the random_id to change and that's causing the node pool to recreate.
17.0.0 - https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/v17.0.0/modules/beta-private-cluster-update-variant/cluster.tf#L313 30.0.0 - https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/v30.0.0/modules/beta-private-cluster-update-variant/cluster.tf#L500
We also tried from 30.0.0 to 32.0.0 and same happening again as new changes are added in keepers 32.0.0 - https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/v32.0.0/modules/beta-private-cluster-update-variant/cluster.tf#L591
We need help on how to upgrade to higher versions without causing the node pool to recreate
Observed behavior
When we run TF plan after upgrading to 30.0.0 we are seeing below resources are recreated
17.0.0 to 30.0.0
30.0.0 to 32.0.0
Terraform Configuration
Terraform Version
Additional information
No response