[BUG] Scaling up nodes on downstream RKE1 cluster causes cluster (intermittently) to "hang" indefinitely

Josh-Diamond commented 1 year ago

Rancher Server Setup

Rancher version: v2.6.12-rc1
Installation option (Docker install/Helm Chart): HA Helm w/ RKE1 local and RKE v1.3.19
Proxy/Cert Details: byo-valid

Information about the Cluster

Kubernetes version: v1.24.10-rancher4-1
Cluster Type (Local/Downstream): Downstream EC2 RKE1 w/ individual roles -[1 etcd, 1 cp, 1 wkr.. then scale to 3 etcd, 2 cp, 3 wkr

User Information

What is the role of the user logged in? Admin

Provider Information

What is the version of the Rancher v2 Terraform Provider in use? 2.0.0
What is the version of Terraform in use? 0.13.7

Describe the bug

When provisioning a downstream EC2 RKE1 cluster w/ individual roles, the cluster successfully provisions. Attempting to then scale up the nodes, sometimes results in the cluster hanging, indefinitely. This is not seen via Rancher UI. (I was only able to encounter this when using rancher2 provider)

To Reproduce

Fresh install of rancher v2.6.12-rc1
Using rancher2 TFP 2.0.0, provision a downstream EC2 RKE1 cluster, v1.24.10-rancher4-1, w/ 1 etcd, 1 cp, and 1 wkr
Once active, scale up nodes (via TF) to 3 etcd, 2 cp, 3 wkr
Reproduced

Actual Result

cluster hangs indefinitely, scale up never achieved

Expected Result

cluster expected to scale up nodes successfully

Screenshots

Cluster Management

Provisioning logs

Additional context

Its possible this affects RKE1 across multiple providers, but initially seen w/ EC2. I will attempt to reproduce w/ Linode and confirm shortly (in comment below) if that is affected as well.

Josh-Diamond commented 1 year ago

issue seen w/ Linode as well - [not EC2 specific]

a-blender commented 1 year ago

@Josh-Diamond Do you only see this when prov TF clusters or also via the UI?

a-blender commented 1 year ago

I will work on reproducing this issue.

rancher / terraform-provider-rancher2