Examples don't work in RHEL8. NGINX Backend, CoreDNS, Metrics server left in crash loop.

BrandonALXEllisSS commented 3 years ago

I seem oddly unable to make any use of this module at all. When I deploy the example TF files (quickstart and cloud-enabled) I get a wonky deployment that has the NGINX Backend, CoreDNS, and Metrics pods cycle repeatedly in a crash loop

Steps I performed:

cd into the quickstart or cloud-enabled folder
Make sure your AWS credentials are set to a govcloud account (i.e. set AWS_PROFILE)
terraform init
terraform apply -auto-approve
export KUBECONFIG=$PWD/rke2.yaml
kubectl get pods -n kube-system

Notice these 3 pods failing

Also fetching logs from these containers provided no insight as to what's happening


quickstart> kubectl logs -n kube-system rke2-coredns-rke2-coredns-6f7676fdf7-p9z7z -p
.:53
[INFO] plugin/reload: Running configuration MD5 = 7da3877dbcacfd983f39051ecafd33bd
CoreDNS-1.6.9
linux/amd64, go1.15.2b5, 17665683
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s

quickstart> kubectl logs -n kube-system rke2-ingress-nginx-default-backend-65f75d6664-nrckx -p stopping http server...

quickstart> kubectl logs -n kube-system rke2-metrics-server-5d8c549c9f-297tx -p I0826 18:51:11.018334 1 secure_serving.go:116] Serving securely on [::]:8443

BrandonALXEllisSS commented 3 years ago

EDIT: I was running v1.1.3 of this repo when making this issue. Whoops!

Regardless, the same containers still fail in the same manner. However, the logs are a bit different. Looking at the logs of all 3 containers, they seem to hang whenever they try contacting the kubernetes service. i.e.

Error: Kubernetes cluster unreachable: Get "https://10.43.0.1:443/version?timeout=32s": dial tcp 10.43.0.1:443: i/o timeout

Furthermore, it appears that this issue is tied to the latest version of the RHEL8 image. Using the latest version available in govcloud (arn:aws-us-gov:ec2:us-gov-west-1::image/ami-0ac4e06a69870e5be) as per the quickstart, I run into this error. However, if I switch it out with RHEL7, everything is fine.

drduker commented 3 years ago

Same with both examples - quickstart and cloud-enabled @joshrwolf

aleiner commented 1 year ago

Going to close this, but please open a new issue if problems persist. I have been able to use EL8 with recent versions of RKE2 without issue.

rancherfederal / rke2-aws-tf

Examples don't work in RHEL8. NGINX Backend, CoreDNS, Metrics server left in crash loop. #47