rancher / terraform-provider-rancher2

Terraform Rancher2 provider
https://www.terraform.io/docs/providers/rancher2/
Mozilla Public License 2.0
263 stars 226 forks source link

[BUG] Scaling up nodes on downstream RKE1 cluster causes cluster (intermittently) to "hang" indefinitely #1107

Open Josh-Diamond opened 1 year ago

Josh-Diamond commented 1 year ago

Rancher Server Setup

Information about the Cluster

User Information

Provider Information

Describe the bug

When provisioning a downstream EC2 RKE1 cluster w/ individual roles, the cluster successfully provisions. Attempting to then scale up the nodes, sometimes results in the cluster hanging, indefinitely. This is not seen via Rancher UI. (I was only able to encounter this when using rancher2 provider)

To Reproduce

  1. Fresh install of rancher v2.6.12-rc1
  2. Using rancher2 TFP 2.0.0, provision a downstream EC2 RKE1 cluster, v1.24.10-rancher4-1, w/ 1 etcd, 1 cp, and 1 wkr
  3. Once active, scale up nodes (via TF) to 3 etcd, 2 cp, 3 wkr
  4. Reproduced

Actual Result

cluster hangs indefinitely, scale up never achieved

Expected Result

cluster expected to scale up nodes successfully

Screenshots

Cluster Management

Screenshot 2023-04-19 at 11 00 04 AM

Provisioning logs

Screenshot 2023-04-19 at 10 56 58 AM

Additional context

Its possible this affects RKE1 across multiple providers, but initially seen w/ EC2. I will attempt to reproduce w/ Linode and confirm shortly (in comment below) if that is affected as well.

Josh-Diamond commented 1 year ago

issue seen w/ Linode as well - [not EC2 specific]

a-blender commented 1 year ago

@Josh-Diamond Do you only see this when prov TF clusters or also via the UI?

a-blender commented 1 year ago

I will work on reproducing this issue.