Cannot deploy Quick Start on AWS - module.rancher_common.helm_release.cert_manager

JohnWright-GSE commented 2 years ago

Consistently getting a failure on deployment:

module.rancher_common.helm_release.cert_manager: Creating...
module.rancher_common.helm_release.cert_manager: Still creating... [10s elapsed]
module.rancher_common.helm_release.cert_manager: Still creating... [20s elapsed]
module.rancher_common.helm_release.cert_manager: Still creating... [30s elapsed]
╷
│ Error: Kubernetes cluster unreachable: Get "https://52.56.131.57:6443/version?timeout=32s": dial tcp 52.56.131.57:6443: i/o timeout
│ 
│   with module.rancher_common.helm_release.cert_manager,
│   on ../rancher-common/helm.tf line 4, in resource "helm_release" "cert_manager":
│    4: resource "helm_release" "cert_manager" {

Deployed from mac Big Sur. Let me know if you need any version information for other components.

JohnWright-GSE commented 2 years ago

$ terraform -v
Terraform v1.1.2
on darwin_amd64
+ provider registry.terraform.io/hashicorp/aws v3.68.0
+ provider registry.terraform.io/hashicorp/helm v2.4.1
+ provider registry.terraform.io/hashicorp/local v2.1.0
+ provider registry.terraform.io/hashicorp/tls v3.1.0
+ provider registry.terraform.io/loafoe/ssh v1.0.1
+ provider registry.terraform.io/rancher/rancher2 v1.21.0

bashofmann commented 2 years ago

Do you have a security group in place in your AWS default VPC that restricts incoming traffic to port 6443?

JohnWright-GSE commented 2 years ago

Oddly I get exactly the same error when trying to use GCP:

module.rancher_common.helm_release.cert_manager: Still creating... [10s elapsed]
module.rancher_common.helm_release.cert_manager: Still creating... [20s elapsed]
module.rancher_common.helm_release.cert_manager: Still creating... [30s elapsed]
╷
│ Error: Kubernetes cluster unreachable: Get "https://35.242.144.113:6443/version?timeout=32s": dial tcp 35.242.144.113:6443: i/o timeout
│ 
│   with module.rancher_common.helm_release.cert_manager,
│   on ../rancher-common/helm.tf line 4, in resource "helm_release" "cert_manager":
│    4: resource "helm_release" "cert_manager" {

JohnWright-GSE commented 2 years ago

Do you have a security group in place in your AWS default VPC that restricts incoming traffic to port 6443?

No. Just one security group that's pretty open right now (testing purposes).

JohnWright-GSE commented 2 years ago

Resolved.

This was down to a zscaler issue blocking ports. Disabled and it got past this point. However, now hitting another issue:

module.rancher_common.rancher2_cluster.quickstart_workload: Creating...
╷
│ Error: RKE version is not supported [v1.22.5-rancher1-1 v1.22.4-rancher1-1 v1.21.8-rancher1-1 v1.21.6-rancher1-1 v1.20.5-rancher1-1 v1.19.16-rancher1-2 v1.19.16-rancher1-1 v1.19.14-rancher1-2 v1.19.14-rancher1-1 v1.19.8-rancher1-1 v1.19.7-rancher1-1 v1.19.4-rancher1-1 v1.19.3-rancher1-2 v1.19.3-rancher1-1 v1.18.20-rancher1-1 v1.18.17-rancher1-2 v1.18.17-rancher1-1 v1.18.15-rancher1-3 v1.18.14-rancher1-1 v1.18.12-rancher1-1 v1.18.10-rancher1-2 v1.17.17-rancher1-2 v1.17.16-rancher1-1 v1.17.12-rancher1-1 v1.17.9-rancher1-1 v1.17.6-rancher2-2 v1.17.4-rancher1-1 v1.17.2-rancher1-1 v1.17.0-rancher1-2 v1.16.15-rancher1-3 v1.16.15-rancher1-1 v1.16.14-rancher1-1 v1.16.10-rancher2-1 v1.16.9-rancher1-1 v1.16.8-rancher1-1 v1.16.6-rancher1-1 v1.16.4-rancher1-1 v1.16.1-rancher1-1 v1.15.12-rancher2-7 v1.15.12-rancher2-6 v1.15.12-rancher2-3 v1.15.12-rancher2-2 v1.15.12-rancher1-1 v1.15.11-rancher1-3 v1.15.11-rancher1-2 v1.15.11-rancher1-1 v1.15.11-rancher1-0 v1.15.9-rancher1-1 v1.15.5-rancher2-2 v1.15.5-rancher1-2 v1.15.4-rancher1-2 v1.15.2-rancher1-1 v1.15.0-rancher1-1 v1.14.10-rancher1-1 v1.14.10-rancher1-0 v1.14.9-rancher1-1 v1.14.8-rancher2-1 v1.13.11-rancher1-1 v1.13.10-rancher1-2 v1.13.9-rancher1-2 v1.13.5-rancher1-3 v1.13.4-rancher1-1 v1.12.10-rancher1-2 v1.12.7-rancher1-4 v1.12.6-rancher1-2 v1.12.6-rancher1-1 v1.12.5-rancher1-1 v1.12.4-rancher1-1 v1.12.3-rancher1-1 v1.12.0-rancher1-1 v1.11.6-rancher1-1 v1.11.3-rancher1-1 v1.11.2-rancher1-2 v1.11.2-rancher1-1 v1.10.12-rancher1-1 v1.10.11-rancher1-1 v1.10.5-rancher1-1 v1.10.1-rancher2-1 v1.8.11-rancher2-1] got v1.20.6-rancher1-1
│ 
│   with module.rancher_common.rancher2_cluster.quickstart_workload,
│   on ../rancher-common/rancher.tf line 25, in resource "rancher2_cluster" "quickstart_workload":
│   25: resource "rancher2_cluster" "quickstart_workload" {

But I will investigate first and open another issue if I can't find a resolution.

Thanks!

bashofmann commented 2 years ago

This issue is tracked at https://github.com/rancher/quickstart/issues/196. As a workaround you should be able to just run terraform apply again and it should continue where it left off.

rancher / quickstart

Cannot deploy Quick Start on AWS - module.rancher_common.helm_release.cert_manager #199