rancher-sandbox / rancher-desktop

Container Management and Kubernetes on the Desktop
https://rancherdesktop.io
Apache License 2.0
6.01k stars 283 forks source link

Issues with k8s ingress going from 1.5.x to 1.6.x #3348

Open amartin120 opened 2 years ago

amartin120 commented 2 years ago

Actual Behavior

I'll preface by stating that I have Traefik disabled in favor of Istio Ingress for my Rancher Desktop k8s setup because of the certain types of testing that I do. In RD versions 1.5.x and earlier, I have been able to access my Kubernetes applications without issue using wildcard domain names that I get via AWS Route53 and certs issued from Cert Manager. However something has changed starting with RD version 1.6 and later and I can no longer access my applications.

Steps to Reproduce

I'll use the fake domain of "rancher.mydomain.com" for the sake of this issue.

In RD 1.5.x and earlier, I can curl https://rancher.mydomain.com and my Istio Ingress will successfully route to my app and display the correct results. The web browser also reflects the same successful results.

In RD 1.6.x, when I curl the same as the above locally, I get

curl -vvv http://rancher.mydomain.com
*   Trying {the correct local ip}:80...
* Connected to rancher.mydomain.com {the correct local ip} port 80 (#0)
> GET / HTTP/1.1
> Host: rancher.mydomain.com
> User-Agent: curl/7.84.0
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer

and from a lima shell

lima-rancher-desktop:/usr/local/bin$ curl https://rancher.mydomain.com
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to rancher.mydomain.com:443

Result

Unable to connect to my applications like I could in 1.5.x and earlier.

Expected Behavior

Things worked the same as RD 1.5.x and earlier.

Additional Information

The only solution that I have currently is to downgrade RD to 1.5.x and factory reset, everything starts working fine again. No application/ingress config is changed when upgrading or downgrading.

Rancher Desktop Version

1.6.x

Rancher Desktop K8s Version

1.23.13

Which container engine are you using?

containerd (nerdctl)

What operating system are you using?

macOS

Operating System / Build Version

Monterey 12.6.1 and Ventura 13.0

What CPU architecture are you using?

arm64 (Apple Silicon)

Linux only: what package format did you use to install Rancher Desktop?

No response

Windows User Only

No response

CaringDev commented 2 years ago

Hopefully this is not polluting the wrong issue, however: we (as in I and some in my team) see the same having k8s disabled and using docker compose on Windows 11. Restarting affected containers eventually (so far never more than 2 restarts) resolves the problem.

krumware commented 2 years ago

This is also an issue on WSL with rancher desktop 1.6+ 80 and 443 are no longer bound by default.

Adding my comments from slack here:

I just want to confirm before I open a GH issue. It does appear that in 1.6.x on windows, port 80 and 443 no longer make it to Traefik. Is this expected?

  1. enable Traefik in RD
  2. deploy ingress for mysite.localhost
  3. access http://mysite.localhost/ (success)
  4. upgrade to 1.6.2
  5. verify Traefik enabled, and WSL enabled, etc
  6. deploy or verify ingress for mysite.localhost
  7. access http://mysite.localhost/ (fail, "This site can't be reached")

Another test:

  1. enable Traefik in RD
  2. access http://anything.localhost/ - observe Traefik 404 page
  3. upgrade to 1.6.2
  4. verify Traefik enabled, and WSL enabled, etc
  5. access http://anything.localhost/ (fail, "This site can't be reached")

I previously thought this was related to the windows 11 22H2 upgrade, but verified this issue on a different machine without the upgrade

markbaumgarten commented 1 year ago

Here is my kubectl get event

14m Normal NodeHasNoDiskPressure node/int208 Node int208 status is now: NodeHasNoDiskPressure 14m Normal NodeHasSufficientPID node/int208 Node int208 status is now: NodeHasSufficientPID 14m Normal NodeNotReady node/int208 Node int208 status is now: NodeNotReady 14m Normal RegisteredNode node/int208 Node int208 event: Registered Node int208 in Controller 13m Normal NodeAllocatableEnforced node/int208 Updated Node Allocatable limit across pods 13m Normal NodeReady node/int208 Node int208 status is now: NodeReady 4m52s Normal Starting node/int208 4m52s Warning listen tcp4 :32535: bind: address already in use node/int208 can't open port "nodePort for kube-system/traefik:web" (:32535/tcp4), skipping it 4m52s Warning listen tcp4 :30161: bind: address already in use node/int208 can't open port "nodePort for kube-system/traefik:websecure" (:30161/tcp4), skipping it 4m48s Normal Starting node/int208 Starting kubelet. 4m48s Warning InvalidDiskCapacity node/int208 invalid capacity 0 on image filesystem 4m48s Normal NodeAllocatableEnforced node/int208 Updated Node Allocatable limit across pods 4m48s Normal NodeHasSufficientMemory node/int208 Node int208 status is now: NodeHasSufficientMemory 4m48s Normal NodeHasNoDiskPressure node/int208 Node int208 status is now: NodeHasNoDiskPressure 4m48s Normal NodeHasSufficientPID node/int208 Node int208 status is now: NodeHasSufficientPID 4m48s Warning Rebooted node/int208 Node int208 has been rebooted, boot id: d0b003fe-a03a-4130-b39f-e48af6217785 4m42s Normal RegisteredNode node/int208 Node int208 event: Registered Node int208 in Controller

amartin120 commented 1 year ago

Still not working with RD 1.7.0. Staying on 1.5.x continues to work just fine.

krumware commented 1 year ago

Could this be connected to anything in https://github.com/rancher-sandbox/rancher-desktop-host-resolver?

amartin120 commented 1 year ago

I think that I've resolved this for myself at least. My local network was forwarding ports 80 and 443 to my local IP (i.e my en0) and this was perfectly fine for when I was running RD 1.5.x and lower. However when I'm on RD 1.6 and 1.7 , if I adjust my port forwarding rules to route 80 and 443 to the lima-0 IP, I'm back up and running.

krumware commented 1 year ago

Can you by chance provide a quick example of that?

(I'm guessing kubectl port-forward --address lima-0 --namespace kube-system service/traefik 443:443)

amartin120 commented 1 year ago

For me it was all about my local network. I'm using AWS route53 for my testing domain and that has an A record to my external facing IP. My local network router had port forwarding rules for incoming (80/443) traffic to route to my MacBook local IP (i.e. 192.168.1.x) that I'm running RD on. Despite being slightly overkill for a typical local test environment, the version of lima used in RD 1.5 and earlier was able resolve that just fine within the VM.

However starting with RD 1.6, the above setup stopped resolving within the lima VM. So what I ended up doing is I adjusted my actual network router port forwarding rules to send (80/443) traffic straight to the lima-0 VM IP. (which is probably what your Traefik service LoadBalancer in your cluster is mapped to)

Does that help?