Closed paddy-hack closed 2 years ago
Curious, rebooting the worker node fixes the Ingress ... :open_mouth:
I have no idea why the Ingress becomes non-functional as a result of the upgrade. I also have no idea how to prevent that from happening during the upgrade but it seems rebooting worker nodes after the upgrade makes the issue go away.
Additional testing with a multi-worker cluster has shown that you can reboot worker nodes one at a time and wait for the rebooting node to become Ready
again. This also worked for me in multi-master node cluster.
You may not need to reboot all worker nodes. Even just rebooting worker nodes not associated with the Ingress fixed things for me.
I've just experienced the same issue. Freshly installed RKE, with basic/default config. Everything was working fine, could create resources using kubectl (pods, svc, ingress, etc). RKE up also created the necessary nginx-ingress-controller on my (only) worker node.
So far so good.
Then created a pod:
k run nginx --image=nginx
Exposed it via ClusterIP (and also with NodePort to ensure it works without ingress)
k expose pod nginx --name=nginx --port=80
k expose pod nginx --name=nginx-nodeport --type=NodePort --port=80 -o yaml --dry-run=client > svc.yaml
Edit svc.yaml to include nodePort: 30080
k apply -f svc.yaml
Check that NodePort works:
curl <WORKER IP>:30080
// OK!
Then, created the ingress:
k create ingress nginx --rule=/*=nginx:80
Check that I can reach the application using the ingress:
curl <IP or DNS of WORKER>/
// ERROR: Connection refused
I've also checked that the nginx configuration inside nginx-ingress-controller got updated.
k exec -it -n ingress-nginx nginx-ingress-controller-5564f -- bash
Found that /etc/nginx/nginx.conf has been updated with the server {} portion matching the ingress resource created earlier. Good.
Also checked that curl
-ing from another POD to nginx-ingress-controller-5564f pod port 80 is working, and it was.
It seemed that everything was in place but despite the nginx-ingress-controller daemonset configuration's hostPort settings, the hostPorts (80, 443 by default) haven't been exposed on the host/worker node.
After worker node restart (hosting the ingress controller) the issue is resolved. curl
-ing from outside the cluster did work. Strange.
We've just experienced the same going from RKE 1.3.8 on K8s 1.21 to RKE 1.3.12 and K8s 1.22.
Just had the same problem going from RKE v1.3.11 with K8s 1.23.6 to RKE v1.3.12 with K8s 1.23.7
there are hostPorts created now. weirdly they are only reachable from the node itself, not from anywhere else...
restarting doesn't help at all
RKE version: Upgrading from
v1.3.10
tov1.3.12
Docker version: (
docker version
,docker info
preferred)docker info
docker version
Operating system and kernel: (
cat /etc/os-release
,uname -r
preferred)Using RancherOS v1.5.8
uname -r
cat /etc/os-release
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Using QEMU (
qemu-system-x86_64
), version 7.0.0, on Devuan GNU/Linuxdaedalus
. The node VMs are started withCluster info
cluster.yml
manifest.yml
Steps to Reproduce:
Log of the RKE v1.3.10 invocation
Adjust IP addresses in
cluster.yml
to match your VMs. Wait for all pods to beRunning
orCompleted
after therke up
invocation. Also wait for the Ingress to get an IP address. It should get the IP address of the sole worker node.kubectl -n welcome describe ingress
Add that IP address to your
/etc/hosts
with a hostname ofwelcome.example.org
. After that thewget
command returns the Nginx welcome page.So far, so good.
Now upgrade to RKE v1.3.12, wait for all pods to be
Running
orCompleted
and re-run thewget
command. It should return the Nginx welcome page again.Log of the RKE v1.3.12 invocation
Results:
The
wget
invocation does not return the Nginx welcome page. Instead it just sits there and I eventually killed it after about 20 minutes.The Ingress hasn't changed except for its age, of course, which is now older than all the Pods that the
rke up
deployed.kubectl -n welcome describe ingress
I can access the Nginx welcome page without trouble from Pods in the cluster's
default
namespace. All of the following return the expected page:If I change the Service to use a NodePort, accessing the assigned port on any of the cluster's nodes also works as expected.
Removing the namespace and redeploying, like so
does not change the situation either.