tailscale / tailscale

The easiest, most secure way to use WireGuard and 2FA.
https://tailscale.com
BSD 3-Clause "New" or "Revised" License
17.23k stars 1.31k forks source link

k8s operator: tailscale ingress sometimes tries to connect to 127.0.0.1 instead of ClusterIP, fails with "netstack: could not connect to local server at ..." #12079

Open garymm opened 1 month ago

garymm commented 1 month ago

What is the issue?

I'm really not sure how to reproduce this but I've seen this a couple of times. Restarting the tailscale ingress pods seems to fix it.

I set up two services (docker-registry, and headlamp) with type ClusterIP both listening on port 80. Both services have a tailscale ingress. This is a test cluster with only one node, so everything is on the same node.

When trying to connect, I see errors like this in the tailscale pod:

2024/05/09 22:55:03 Accept: TCP{100.115.199.49:52551 > 100.77.184.113:80} 64 tcp ok
2024/05/09 22:55:03 netstack: could not connect to local server at 127.0.0.1:80: dial tcp 127.0.0.1:80: connect: connection refused

Restarting the tailscale ingress pod seems to fix the issue.

I'm not a kubernetes expert, but it seems supicious that tailscale is trying to connect to the service on 127.0.0.1:80 rather than using the cluster IP.

# kubectl get svc -A
NAMESPACE         NAME                               TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                  AGE
default           kubernetes                         ClusterIP   10.233.0.1     <none>        443/TCP                  30m
docker-registry   docker-registry                    ClusterIP   10.233.22.50   <none>        80/TCP                 17m
headlamp          headlamp                           ClusterIP   10.233.49.34   <none>        80/TCP                   27m
kube-system       coredns                            ClusterIP   10.233.0.3     <none>        53/UDP,53/TCP,9153/TCP   29m
tailscale         ts-docker-registry-ingress-4w69x   ClusterIP   None           <none>        <none>                   16m
tailscale         ts-headlamp-ingress-6mzrv          ClusterIP   None           <none>        <none>                   17m
# kubectl get ingress -A
NAMESPACE         NAME                      CLASS       HOSTS   ADDRESS                                      PORTS     AGE
docker-registry   docker-registry-ingress   tailscale   *       berkeley-staging-docker.taila1eba.ts.net     80, 443   36m
headlamp          headlamp-ingress          tailscale   *       berkeley-staging-headlamp.taila1eba.ts.net   80, 443   36m

I am able to connect to both services simultaneously using kubectl port-forward, so I'm pretty sure this is not an inherit limitation of my kubernetes set-up.

Steps to reproduce

No response

Are there any recent changes that introduced the issue?

No response

OS

Linux

OS version

kubernetes

Tailscale version

1.62.1

Other software

calico CNI

Bug report

BUG-893cd6c8f00a44fda54bd05672e58e601316cc21d2892024305ff3afa3f4c675-20240509231821Z-5c7d259ff6d31521

garymm commented 1 month ago

I'm seeing this again. I tried upgradng to 1.64.2 (latest helm chart) and deleting all the pods and this time I can't figure out a way to fix it.

garymm commented 1 month ago

Restarting the kubernetes host seems to have fixed it, at least for now.