rancher / quickstart

381 stars 338 forks source link

quickstart on DigitalOcean failed - RKE version is not supported #196

Closed acwwat closed 2 years ago

acwwat commented 2 years ago

I am using rev cfe6296 of quickstart to create a cluster on DigitalOcean with default workload_kubernetes_version (v1.20.6-rancher1-1). It failed with the following error message:

RKE version is not supported [ <list of supported versions> ] got v1.20.6-rancher1-1
│
│   with module.rancher_common.rancher2_cluster.quickstart_workload,
│   on ..\rancher-common\rancher.tf line 25, in resource "rancher2_cluster" "quickstart_workload":
│   25: resource "rancher2_cluster" "quickstart_workload" {

For v1.20.*, only 5, 8, 9, 10, and 12 are supported. v1.21 and 1.22 are also now available.

To note, the default rancher_version value is v2.6.3 if it matters.

I am not sure if this also affects other providers as the default value is the same as, for example, AWS. Can we please have the default value aligned to a supported and stable version? Thanks.

acwwat commented 2 years ago

Strange enough, it seems that when I run it a 2nd time without changing the default version, it still worked and it's showing v1.20.6-rancher1-1 in the Rancher Admin UI.

Then I try it a 3rd time and it failed again. I noticed that the supported version list in the error message is different from the first attempt - see attached file for details. This seems kind of random...

rke_supported_list_1.txt rke_supported_list_2.txt

acwwat commented 2 years ago

This seems to be related to https://github.com/rancher/terraform-provider-rancher2/issues/670 which is supposedly merged, but the problem still occurs today.

acwwat commented 2 years ago

I looked at the code and noticed that the wait has a timeout of 120s. The timeout is set at the provider level as documented here. When I set the timeout to 300s (5 min) in rancher-common/provider.tf, terraform apply from scratch was successful for 4 consecutive times. Please see if increasing the timeout is the proper fix to the running condition issue.

bashofmann commented 2 years ago

@acwwat Thanks for taking the time to look into this and test it. I also could not reproduce this anymore with an increased timeout in multiple runs.

The timeout is increased in https://github.com/rancher/quickstart/commit/193b7a2433aaeea43a6be4f5267fcfa484f586c7.