rancher / rancher

Complete container management platform
http://rancher.com
Apache License 2.0
23.34k stars 2.96k forks source link

Can't deploy custom clusters using EC2 machine with AWS #45922

Open KiteAzurez opened 3 months ago

KiteAzurez commented 3 months ago

Setup

Describe the bug

I tried to do custom create cluster while on a EC2 instance for AWS, but when I did custom for both RKE1 configuration & RKE2 my cluster gets stuck in the provisioning /creating phase.

RKE1 says this: Waiting for etcd, controlplane and worker nodes to be registered RKE2 says this: Configuring bootstrap node(s) custom-68f389d00990: waiting for probes: etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet

I have all check marks checked to create etcd, control plane , worker nodes. I also have t2.large for AWS, so I don't think its an issue with the machine. No firewall, all ports able to access. Docker images run fine and can be accessed from other webpages.

To Reproduce

  1. Click Create
  2. Click Custom
  3. Name your cluster and copy paste the link into EC2 machine (if RKE2 you need the --insecure flag).

Result No cluster will start, it will stay stuck on provisioning/updating . Expected Result

Screenshots

Additional context

I did have this working earlier (both RKE1, RKE2 ) the other day. However, I removed rancher, nginx, & apache from my EC2 instance because there was an issue unrelated. I tried to force remove all docker images/builds and I saw some kubernetes still running as containers. I removed them before installing rancher again.

Also I tried this solution: https://github.com/rancher/rancher/issues/41125#issuecomment-1506620040 with another user having a similar issue, and it didnt work for me. When I enter this : `echo "Rotating kube-controller-manager certificate" rm /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.{crt,key} crictl rm -f $(crictl ps -q --name kube-controller-manager)

echo "Rotating kube-scheduler certificate" rm /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.{crt,key} crictl rm -f $(crictl ps -q --name kube-scheduler)` I get this message:

rm: cannot remove '/var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt': No such file or directory

I noticed when I went to my kube-ccontroller-manager nothing is there. Is there something I'm doing wrong?

Thanks,

torchiaf commented 3 months ago

Thanks @KiteAzurez I think this issue should be moved in https://github.com/rancher/rancher/issues since it seems a backend bug.