rancherfederal / rke2-aws-tf

MIT License
84 stars 68 forks source link

Examples don't work in RHEL8. NGINX Backend, CoreDNS, Metrics server left in crash loop. #47

Closed BrandonALXEllisSS closed 1 year ago

BrandonALXEllisSS commented 3 years ago

I seem oddly unable to make any use of this module at all. When I deploy the example TF files (quickstart and cloud-enabled) I get a wonky deployment that has the NGINX Backend, CoreDNS, and Metrics pods cycle repeatedly in a crash loop

Steps I performed:

  1. cd into the quickstart or cloud-enabled folder
  2. Make sure your AWS credentials are set to a govcloud account (i.e. set AWS_PROFILE)
  3. terraform init
  4. terraform apply -auto-approve
  5. export KUBECONFIG=$PWD/rke2.yaml
  6. kubectl get pods -n kube-system
  7. Notice these 3 pods failing image Also fetching logs from these containers provided no insight as to what's happening
    
    quickstart> kubectl logs -n kube-system rke2-coredns-rke2-coredns-6f7676fdf7-p9z7z -p
    .:53
    [INFO] plugin/reload: Running configuration MD5 = 7da3877dbcacfd983f39051ecafd33bd
    CoreDNS-1.6.9
    linux/amd64, go1.15.2b5, 17665683
    [INFO] SIGTERM: Shutting down servers then terminating
    [INFO] plugin/health: Going into lameduck mode for 5s

quickstart> kubectl logs -n kube-system rke2-ingress-nginx-default-backend-65f75d6664-nrckx -p stopping http server...

quickstart> kubectl logs -n kube-system rke2-metrics-server-5d8c549c9f-297tx -p I0826 18:51:11.018334 1 secure_serving.go:116] Serving securely on [::]:8443

BrandonALXEllisSS commented 3 years ago

EDIT: I was running v1.1.3 of this repo when making this issue. Whoops!

Regardless, the same containers still fail in the same manner. However, the logs are a bit different. Looking at the logs of all 3 containers, they seem to hang whenever they try contacting the kubernetes service. i.e.

Error: Kubernetes cluster unreachable: Get "https://10.43.0.1:443/version?timeout=32s": dial tcp 10.43.0.1:443: i/o timeout

Furthermore, it appears that this issue is tied to the latest version of the RHEL8 image. Using the latest version available in govcloud (arn:aws-us-gov:ec2:us-gov-west-1::image/ami-0ac4e06a69870e5be) as per the quickstart, I run into this error. However, if I switch it out with RHEL7, everything is fine.

drduker commented 3 years ago

Same with both examples - quickstart and cloud-enabled @joshrwolf

aleiner commented 1 year ago

Going to close this, but please open a new issue if problems persist. I have been able to use EL8 with recent versions of RKE2 without issue.