Closed curtisy1 closed 1 year ago
Have you looked at the pod status or error logs to see why there are no endpoints? Its hard to help without knowing what is actually going on with the pods.
Gotcha! So far, I haven't looked at the logs but I'll try to reproduce again and get some logs later today on my personal machine when I'm home since I'm happy it's working fine now with k3s (never change a running system).
Alright, sorry for the wait. Here's the output from kubectl -n kube-system logs rke2-ingress-nginx-controller-4zc26
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: nginx-1.4.1-hardened2
Build: git-452bd444e
Repository: https://github.com/rancher/ingress-nginx.git
nginx version: nginx/1.19.10
-------------------------------------------------------------------------------
W0307 20:53:36.822501 7 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0307 20:53:36.823278 7 main.go:209] "Creating API client" host="https://10.43.0.1:443"
I0307 20:53:36.849764 7 main.go:253] "Running in Kubernetes cluster" major="1" minor="24" git="v1.24.10+rke2r1" state="clean" commit="5c1d2d4295f9b4eb12bfbf6429fdf989f2ca8a02" platform="linux/amd64"
I0307 20:53:36.959132 7 main.go:104] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem"
I0307 20:53:36.977878 7 ssl.go:533] "loading tls certificate" path="/usr/local/certificates/cert" key="/usr/local/certificates/key"
I0307 20:53:37.020201 7 nginx.go:260] "Starting NGINX Ingress controller"
I0307 20:53:37.057882 7 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"rke2-ingress-nginx-controller", UID:"931c99f4-d21f-4aad-ae02-1c3ba69dea81", APIVersion:"v1", ResourceVersion:"1122", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap kube-system/rke2-ingress-nginx-controller
I0307 20:53:38.222033 7 nginx.go:303] "Starting NGINX process"
I0307 20:53:38.222210 7 leaderelection.go:248] attempting to acquire leader lease kube-system/ingress-controller-leader...
I0307 20:53:38.223991 7 nginx.go:323] "Starting validation webhook" address=":8443" certPath="/usr/local/certificates/cert" keyPath="/usr/local/certificates/key"
I0307 20:53:38.230060 7 controller.go:168] "Configuration changes detected, backend reload required"
I0307 20:53:38.241260 7 leaderelection.go:258] successfully acquired lease kube-system/ingress-controller-leader
I0307 20:53:38.241934 7 status.go:84] "New leader elected" identity="rke2-ingress-nginx-controller-4zc26"
I0307 20:53:38.265937 7 status.go:214] "POD is not ready" pod="kube-system/rke2-ingress-nginx-controller-4zc26" node="main"
I0307 20:53:38.367269 7 controller.go:185] "Backend successfully reloaded"
I0307 20:53:38.367617 7 controller.go:196] "Initial sync, sleeping for 1 second"
I0307 20:53:38.367784 7 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"rke2-ingress-nginx-controller-4zc26", UID:"95532d48-8ee0-4291-833c-d91ee4f7e9fe", APIVersion:"v1", ResourceVersion:"1155", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
Oddly enough, it seems to start up just fine. However, there's another interesting correlation with this error from looking at what
helm install rancher rancher-stable/rancher \
--namespace cattle-system \
--set hostname="[Server IP].sslip.io" \
--set bootstrapPassword="$ServerPassword"
throws at me
E0307 21:53:21.472133 91149 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
I suspect this has something to do with the ingress because it's trying to access it during startup? Because the next thing I get after this error message is the good old
Error: INSTALLATION FAILED: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "https://rke2-ingress-nginx-controller-admission.kube-system.svc:443/networking/v1/ingresses?timeout=10s": no endpoints available for service "rke2-ingress-nginx-controller-admission"
If you need any other log files, feel free to ask
The logs look fine but for some reason there are no endpoints for the webhook service. Can you get the output of kubectl get pod -A -o wide
and kubectl get service -A -o wide
?
Sure thing! Here's the outputs of kubectl get pod -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cert-manager cert-manager-85945b75d4-z568w 1/1 Running 0 99s 10.42.0.4 ubuntu-4gb-fsn1-1 <none> <none>
cert-manager cert-manager-cainjector-7f694c4c58-4ftsq 1/1 Running 0 99s 10.42.0.7 ubuntu-4gb-fsn1-1 <none> <none>
cert-manager cert-manager-webhook-7cd8c769bb-ddtjs 1/1 Running 0 99s 10.42.0.6 ubuntu-4gb-fsn1-1 <none> <none>
kube-system cloud-controller-manager-ubuntu-4gb-fsn1-1 1/1 Running 0 112s 123.201.116.44 ubuntu-4gb-fsn1-1 <none> <none>
kube-system etcd-ubuntu-4gb-fsn1-1 1/1 Running 0 111s 123.201.116.44 ubuntu-4gb-fsn1-1 <none> <none>
kube-system helm-install-rke2-canal-9q4rk 0/1 Completed 0 100s 123.201.116.44 ubuntu-4gb-fsn1-1 <none> <none>
kube-system helm-install-rke2-coredns-bb6qv 0/1 Completed 0 100s 123.201.116.44 ubuntu-4gb-fsn1-1 <none> <none>
kube-system helm-install-rke2-ingress-nginx-m2jmk 0/1 Completed 0 100s 10.42.0.3 ubuntu-4gb-fsn1-1 <none> <none>
kube-system helm-install-rke2-metrics-server-jwwbx 0/1 Completed 0 100s 10.42.0.8 ubuntu-4gb-fsn1-1 <none> <none>
kube-system kube-apiserver-ubuntu-4gb-fsn1-1 1/1 Running 0 112s 123.201.116.44 ubuntu-4gb-fsn1-1 <none> <none>
kube-system kube-controller-manager-ubuntu-4gb-fsn1-1 1/1 Running 0 105s 123.201.116.44 ubuntu-4gb-fsn1-1 <none> <none>
kube-system kube-proxy-ubuntu-4gb-fsn1-1 1/1 Running 0 108s 123.201.116.44 ubuntu-4gb-fsn1-1 <none> <none>
kube-system kube-scheduler-ubuntu-4gb-fsn1-1 1/1 Running 0 105s 123.201.116.44 ubuntu-4gb-fsn1-1 <none> <none>
kube-system rke2-canal-65fhj 2/2 Running 0 89s 123.201.116.44 ubuntu-4gb-fsn1-1 <none> <none>
kube-system rke2-coredns-rke2-coredns-58fd75f64b-4ftgk 1/1 Running 0 90s 10.42.0.5 ubuntu-4gb-fsn1-1 <none> <none>
kube-system rke2-coredns-rke2-coredns-autoscaler-768bfc5985-p9q6d 1/1 Running 0 90s 10.42.0.9 ubuntu-4gb-fsn1-1 <none> <none>
kube-system rke2-ingress-nginx-controller-fx5vc 1/1 Running 0 48s 10.42.0.12 ubuntu-4gb-fsn1-1 <none> <none>
kube-system rke2-metrics-server-74f878b999-w92rj 1/1 Running 0 58s 10.42.0.10 ubuntu-4gb-fsn1-1 <none> <none>
And the output of kubectl get service -A -o wide
, respectively
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
cattle-system rancher ClusterIP 10.43.125.234 <none> 80/TCP,443/TCP 75s app=rancher
cert-manager cert-manager ClusterIP 10.43.232.23 <none> 9402/TCP 2m22s app.kubernetes.io/component=controller,app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=cert-manager
cert-manager cert-manager-webhook ClusterIP 10.43.27.97 <none> 443/TCP 2m22s app.kubernetes.io/component=webhook,app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=webhook
default kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 2m31s <none>
kube-system rke2-coredns-rke2-coredns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP 2m5s app.kubernetes.io/instance=rke2-coredns,app.kubernetes.io/name=rke2-coredns,k8s-app=kube-dns
kube-system rke2-ingress-nginx-controller-admission ClusterIP 10.43.208.50 <none> 443/TCP 83s app.kubernetes.io/component=controller,app.kubernetes.io/instance=rke2-ingress-nginx,app.kubernetes.io/name=rke2-ingress-nginx
kube-system rke2-metrics-server ClusterIP 10.43.76.197 <none> 443/TCP 93s app=rke2-metrics-server,release=rke2-metrics-server
Are you perhaps just trying to install Rancher before all the components are done starting up? Everything looks fine now. If you try to install Rancher before nginx is done starting, yeah you'll get errors because the webhook has been added but nginx isn't ready yet.
That... seems to be it. At least if I manually wait for a few seconds I can install without any errors. Thanks for clearing that up!
I'm guessing what's still confusing me is how this used to work without any wait on my end before but since I can't pinpoint the release I used anymore and it might very well have been a due to overall machine slowness at the time, I'll close this as solved since this clears up my confusion.
Thanks again for helping me out with this!
Just in case, if you want to ensure the nginx controller has been already deployed you can use:
while ! kubectl rollout status daemonset -n kube-system rke2-ingress-nginx-controller --timeout=60s; do sleep 2 ; done
Environmental Info: RKE2 Version:
rke2 version v1.24.10+rke2r1 (1ccdce2571291649b9414af1f269f645c3fe4002) go version go1.19.5 X:boringcrypto
Node(s) CPU architecture, OS, and Version:
Linux Ubuntu-2204-jammy-amd64-base 5.15.0-60-generic # 66-Ubuntu SMP Fri Jan 20 14:29:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
1 server, 3 agents
Describe the bug:
Trying to set up Rancher together with RKE2 does not work for me anymore. This happens on a clean install from a Hetzner dedicated root server, where I can 100% reproduce this behaviour (clean install meaning there's no other dependencies installed except for the Ubuntu minimal stuff).
EDIT: I should probably mention that it works just fine with k3s and traefik, so this seems to be rke2 related.
Steps To Reproduce:
Here's a gist of the setup script I'm currently using. I'm sure it could be improved, but it used to work just fine before. It basically boils down to:
Expected behavior:
I get a nice and shiny Rancher UI I can use. Be it using my own subdomain or a sample DNS entry using sslip.io
Actual behavior:
The install fails at the last step.
Additional context / logs:
INSTALLATION FAILED: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "https://rke2-ingress-nginx-controller-admission.kube-system.svc:443/networking/v1/ingresses?timeout=10s": no endpoints available for service "rke2-ingress-nginx-controller-admission"