If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): Custom TF provisioned
User Information
What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom)
If custom, define the set of permissions: Admin
Provider Information
What is the version of the Rancher v2 Terraform Provider in use? 3.0.2
What is the version of Terraform in use? Terraform v1.0.6
Also using terragrunt version v0.31.8
Describe the bug
When upgrading a custom RKE cluster and going from dockershim to cri_dockerd Terraform errors out :
rancher2_cluster.cluster: Modifying... [id=c-8nkxk]
rancher2_cluster.cluster: Still modifying... [id=c-8nkxk, 10s elapsed]
rancher2_cluster.cluster: Still modifying... [id=c-8nkxk, 20s elapsed]
rancher2_cluster.cluster: Still modifying... [id=c-8nkxk, 30s elapsed]
rancher2_cluster.cluster: Still modifying... [id=c-8nkxk, 40s elapsed]
╷
│ Error: [ERROR] waiting for cluster (c-8nkxk) to be updated: unexpected state 'error', wanted target 'active, provisioning, pending'. last error: %!s(<nil>)
│
│ with rancher2_cluster.cluster,
│ on rancher.tf line 13, in resource "rancher2_cluster" "cluster":
│ 13: resource "rancher2_cluster" "cluster" {
This only happens during a cluster upgrade from 1.23 to 1.24/1.25. The cluster will finish upgrading as shown in the Cluster Management provisioning logs in Rancher. It will eventually show up as Active.
Upgrading the cluster to another higher version of Kubernetes afterwards through Terraform for example 1.26.x will succeed and TF doesn't error out with the unexpected state 'error'
Running a TF plan afterwards will show that kubernetes components have been changed outside of Terraform and that the infrastructure matches the changes.
To Reproduce
Create a custom RKE cluster with out usual TF process running 1.23.10
Change the kubernetes_version in rke_config to 1.24.10 or 1.25.9
Terraform plan apply : starts applying the plan then errors out on the rancher2_cluster resource :
rancher2_cluster.cluster: Modifying... [id=c-8nkxk]
rancher2_cluster.cluster: Still modifying... [id=c-8nkxk, 10s elapsed]
rancher2_cluster.cluster: Still modifying... [id=c-8nkxk, 20s elapsed]
rancher2_cluster.cluster: Still modifying... [id=c-8nkxk, 30s elapsed]
rancher2_cluster.cluster: Still modifying... [id=c-8nkxk, 40s elapsed]
rancher2_cluster.cluster: Still modifying... [id=c-8nkxk, 50s elapsed]
rancher2_cluster.cluster: Still modifying... [id=c-8nkxk, 1m0s elapsed]
rancher2_cluster.cluster: Still modifying... [id=c-8nkxk, 1m10s elapsed]
rancher2_cluster.cluster: Still modifying... [id=c-8nkxk, 1m20s elapsed]
╷
│ Warning: Experimental feature "module_variable_optional_attrs" is active
│
│ on main.tf line 2, in terraform:
│ 2: experiments = [module_variable_optional_attrs]
│
│ Experimental features are subject to breaking changes in future minor or
│ patch releases, based on feedback.
│
│ If you have feedback on the design of this feature, please open a GitHub
│ issue to discuss it.
╵
╷
│ Error: [ERROR] waiting for cluster (c-8nkxk) to be updated: unexpected state 'error', wanted target 'active, provisioning, pending'. last error: %!s(<nil>)
│
│ with rancher2_cluster.cluster,
│ on rancher.tf line 13, in resource "rancher2_cluster" "cluster":
│ 13: resource "rancher2_cluster" "cluster" {
│
╵
Releasing state lock. This may take a few moments...
time=2023-07-18T15:34:27-04:00 level=error msg=1 error occurred:
* exit status 1
Actual Result
Expected Result
Upgrading from 1.23 to 1.24 or later should not error out while using the rancher2 Terraform provider.
Screenshots
Additional context
I suspect it has to do with going from dockershim to cri_dockerd through the Rancher2 provider. Just enabling cri_dockerd through Terraform gives out the same error.
Rancher Server Setup
Information about the Cluster
User Information
Provider Information
Describe the bug
When upgrading a custom RKE cluster and going from dockershim to cri_dockerd Terraform errors out :
To Reproduce
Actual Result
Expected Result
Upgrading from 1.23 to 1.24 or later should not error out while using the rancher2 Terraform provider.
Screenshots
Additional context
I suspect it has to do with going from dockershim to cri_dockerd through the Rancher2 provider. Just enabling cri_dockerd through Terraform gives out the same error.