tailscale / tailscale

The easiest, most secure way to use WireGuard and 2FA.
https://tailscale.com
BSD 3-Clause "New" or "Revised" License
18.73k stars 1.45k forks source link

tailscale-operator ingress can't hit intra-cluster service #9141

Closed jtschelling closed 1 year ago

jtschelling commented 1 year ago

What is the issue?

i'm running the tailscale-operator on a kubernetes cluster, and attempting to use the new ingressclass introduced recently. here's my example manifest i'm using to test:

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nginx-ingress
  namespace: tailscale-test
spec:
  ingressClassName: tailscale
  rules:
    - http:
        paths:
          - path: /
            pathType: Exact
            backend:
              service:
                name: nginx-service
                port:
                  name: web-service
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  namespace: tailscale-test
spec:
  type: ClusterIP
  selector:
    app.kubernetes.io/name: nginx
  ports:
    - name: web-service
      protocol: TCP
      port: 80
      targetPort: web
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  namespace: tailscale-test
spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: nginx
  template:
    metadata:
      labels:
        app.kubernetes.io/name: nginx
    spec:
      containers:
      - name: web
        image: nginx
        ports:
        - containerPort: 80
          name: web

the ingress comes up fine and i get tailscale-test-nginx-ingress-ingress machine added to my tailnet in the admin console. I can run tailscale ping $INGRESSTAILNETIP fine, here's the output

pong from tailscale-test-nginx-ingress-ingress ($INGRESSTAILNETIP) via DERP(ord) in 12ms
pong from tailscale-test-nginx-ingress-ingress ($INGRESSTAILNETIP) via DERP(ord) in 16ms
pong from tailscale-test-nginx-ingress-ingress ($INGRESSTAILNETIP) via DERP(ord) in 27ms
pong from tailscale-test-nginx-ingress-ingress ($INGRESSTAILNETIP) via $LANIP:58208 in 9ms

when i try to curl http://$INGRESSTAILNETIP i get:

curl: (7) Failed to connect to $INGRESSTAILNETIP port 80 after 37 ms: Couldn't connect to server

there's a pod ts-nginx-ingress-58pz9-0 that gets created in the tailscale namespace that the operator is running in. when i look at the logs in that pod after running the curl command i see:

tailscale 2023/08/29 20:35:39 Accept: TCP{$MYLAPTOPTAILNETIP:57929 > $INGRESSTAILNETIP:80} 64 tcp ok
tailscale 2023/08/29 20:35:39 [unexpected] localbackend: got TCP conn without TCP config for port 80; from
$MYLAPTOPTAILNETIP:57929
tailscale 2023/08/29 20:35:39 netstack: could not connect to local server at 127.0.0.1:80: dial tcp 127.0.0
.1:80: connect: connection refused

I wondered if it was something with the pod being unable to hit the namespace the test conatiner was running in so i used netshoot to run a debug container in the same pod as the ts-nginx-ingress-58pz9-0, but i can curl the nginx service fine from there:

 ts-nginx-ingress-58pz9-0  ~  curl nginx-service.tailscale-test.svc.cluster.local
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
.......

i'm wondering if its something with iptables? i'm using a debian based distro on a raspberry pi (ARM) called dietpi. i see entries in my iptables rules for tailscale but not quite sure what i'm looking for, can provide more info if i get some guidance.

Steps to reproduce

No response

Are there any recent changes that introduced the issue?

No response

OS

Linux, macOS

OS version

No response

Tailscale version

unstable-v1.49.121

Other software

i'm using cilium for my cni. i'm monitoring with hubble, i can see the incoming connection from my laptop on the tailnet to the pod that gets created by the ingress in the tailscale namespace, but there's no outgoing connection to my nginx service so i don't think cilium is blocking that outgoing connection.

Bug report

BUG-4aeb98ad042df268512fdfef314601e58f045bf66424dcabadb2e2e87f0cbba2-20230829205453Z-20c4680ff940f803

jtschelling commented 1 year ago

not really a "bug" i don't think, but wasn't sure the best place to post this. tried the irc channel but don't think its very active

maisem commented 1 year ago

@jtschelling ingress only creates services reachable over 443, so can you try https://fqdn? if you do kubectl get ingress it should print out the fqdn

jtschelling commented 1 year ago

hm, still getting the same log messages in the ts-nginx-ingress-58pz9-0 pod in the tailscale namespace. also tried editing my service definition to listen on 443 after trying 80 still

kctl describe ingress
Name:             nginx-ingress
Labels:           <none>
Namespace:        tailscale-test
Address:          tailscale-test-nginx-ingress-ingress.tailXXXXX.ts.net
Ingress Class:    tailscale
Default backend:  <default>
Rules:
  Host        Path  Backends
  ----        ----  --------
  *
              /   nginx-service:web-service (10.0.0.33:80,10.0.0.63:80)
Annotations:  <none>
Events:       <none>

curl:

$ curl https://tailscale-test-nginx-ingress-ingress.tail08b90.ts.net/
curl: (7) Failed to connect to tailscale-test-nginx-ingress-ingress.tailXXXXX.ts.net port 443 after 215 ms: Couldn't connect to server

logs from the ts-nginx-ingress-58pz9-0 pod:

tailscale 2023/08/29 21:13:16 Accept: TCP{$MYLAPTOPTAILNETIP:58809 > $INGRESSTAILNETIP:443} 64 tcp ok
tailscale 2023/08/29 21:13:16 [unexpected] localbackend: got TCP conn without TCP config for port 443; from
 $MYLAPTOPTAILNETIP:58809
tailscale 2023/08/29 21:13:16 netstack: could not connect to local server at 127.0.0.1:443: dial tcp 127.0.
0.1:443: connect: connection refused
maisem commented 1 year ago

i applied the exact config you put in https://github.com/tailscale/tailscale/issues/9141#issue-1872459665 and it worked for me.

I ran

➜  ~ curl https://tailscale-test-nginx-ingress-ingress.tail-scale.ts.net
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Can you paste the output of

kubectl -n tailscale get secret -l tailscale.com/parent-resource=nginx-ingress,tailscale.com/parent-resource-ns=tailscale-test,tailscale.com/parent-resource-type=ingress -o json | jq '.items[].data."serve-config"' -r | base64
-d
jtschelling commented 1 year ago
$ kubectl -n tailscale get secret -l tailscale.com/parent-resource=nginx-ingress,tailscale.com/parent-resource-ns=tailscale-test,tailscale.com/parent-resource-type=ingress -o json | jq '.items[].data."serve-config"' -r | base64 -d
{"TCP":{"443":{"HTTPS":true}},"Web":{"${TS_CERT_DOMAIN}:443":{"Handlers":{"/":{"Proxy":"http://10.43.204.137:80/"}}}}}

$ kctl get service nginx-service
NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
nginx-service   ClusterIP   10.43.204.137   <none>        443/TCP   3h
maisem commented 1 year ago

By changing the service to port 443, it no longer matches the output you pasted in the first comment here.

Can you please paste the latest svc and ingress yaml? Can you also paste the latest operator and container logs?

jtschelling commented 1 year ago

sorry was playing around with it too much trying different things. i went back to the exact config in my original comment, the one that worked for you here https://github.com/tailscale/tailscale/issues/9141#issuecomment-1698239195

ingress yaml:

$ kctl get ingress nginx-ingress -oyaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"networking.k8s.io/v1","kind":"Ingress","metadata":{"annotations":{},"name":"nginx-ingress","namespace":"tailscale-test"},"spec":{"ingressClassName":"tailscale","rules":[{"http":{"paths":[{"backend":{"service":{"name":"nginx-service","port":{"name":"web-service"}}},"path":"/","pathType":"Exact"}]}}]}}
  creationTimestamp: "2023-08-30T16:53:51Z"
  finalizers:
  - tailscale.com/finalizer
  generation: 1
  name: nginx-ingress
  namespace: tailscale-test
  resourceVersion: "175382"
  uid: c6e3fbea-6033-4dad-ae55-a905922bd607
spec:
  ingressClassName: tailscale
  rules:
  - http:
      paths:
      - backend:
          service:
            name: nginx-service
            port:
              name: web-service
        path: /
        pathType: Exact
status:
  loadBalancer:
    ingress:
    - hostname: tailscale-test-nginx-ingress-ingress.tail08b90.ts.net
      ports:
      - port: 443
        protocol: TCP

service yaml:

$ kctl get service nginx-service -oyaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"nginx-service","namespace":"tailscale-test"},"spec":{"ports":[{"name":"web-service","port":80,"protocol":"TCP","targetPort":"web"}],"selector":{"app.kubernetes.io/name":"nginx"},"type":"ClusterIP"}}
  creationTimestamp: "2023-08-30T16:53:51Z"
  name: nginx-service
  namespace: tailscale-test
  resourceVersion: "175302"
  uid: 2aa134b9-9745-4be3-8e83-b745129c0def
spec:
  clusterIP: 10.43.238.10
  clusterIPs:
  - 10.43.238.10
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: web-service
    port: 80
    protocol: TCP
    targetPort: web
  selector:
    app.kubernetes.io/name: nginx
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

here's the entire logs from the ts-nginx-ingress-2chnk-0 pod running in the tailscale namespace with the operator pod, from start up after applying the ingress manifest to the error i get after running curl https://tailscale-test-nginx-ingress-ingress.tail08b90.ts.net, the same netstack: could not connect to local server ...... i was getting earlier

boot: 2023/08/30 16:58:21 No authkey found in kube secret and TS_AUTHKEY not provided, login will be intera
ctive if needed.
boot: 2023/08/30 16:58:21 Starting tailscaled
boot: 2023/08/30 16:58:21 Waiting for tailscaled socket
2023/08/30 16:58:21 logtail started
2023/08/30 16:58:21 Program starting: v1.49.121-t3451b89e5, Go 1.21.0: []string{"tailscaled", "--socket=/tm
p/tailscaled.sock", "--state=kube:ts-nginx-ingress-2chnk-0", "--statedir=/tmp", "--tun=userspace-networking
"}
2023/08/30 16:58:21 LogID: 2eaf67dd8b24335c674c3fe2a39446c74a82d97d6241f8fc50005481f87b6e60
2023/08/30 16:58:21 logpolicy: using system state directory "/var/lib/tailscale"
logpolicy.ConfigFromFile /var/lib/tailscale/tailscaled.log.conf: open /var/lib/tailscale/tailscaled.log.con
f: no such file or directory
logpolicy.Config.Validate for /var/lib/tailscale/tailscaled.log.conf: config is nil
2023/08/30 16:58:21 wgengine.NewUserspaceEngine(tun "userspace-networking") ...
2023/08/30 16:58:21 dns: using dns.noopManager
2023/08/30 16:58:21 link state: interfaces.State{defaultRoute=eth0 ifs={eth0:[10.0.0.117/32]} v4=true v6=fa
lse}
2023/08/30 16:58:21 magicsock: [warning] failed to force-set UDP read buffer size to 7340032: operation not
 permitted; using kernel default values (impacts throughput only)
2023/08/30 16:58:21 magicsock: [warning] failed to force-set UDP write buffer size to 7340032: operation no
t permitted; using kernel default values (impacts throughput only)
2023/08/30 16:58:21 magicsock: [warning] failed to force-set UDP read buffer size to 7340032: operation not
 permitted; using kernel default values (impacts throughput only)
2023/08/30 16:58:21 magicsock: [warning] failed to force-set UDP write buffer size to 7340032: operation no
t permitted; using kernel default values (impacts throughput only)
2023/08/30 16:58:21 magicsock: disco key = d:a2de66d727a75820
2023/08/30 16:58:21 Creating WireGuard device...
2023/08/30 16:58:21 Bringing WireGuard device up...
2023/08/30 16:58:21 Bringing router up...
2023/08/30 16:58:21 Clearing router settings...
2023/08/30 16:58:21 Starting network monitor...
2023/08/30 16:58:21 Engine created.
2023/08/30 16:58:21 pm: using backend prefs for "profile-b083": Prefs{ra=false dns=false want=true routes=[
] nf=on host="tailscale-test-nginx-ingress-ingress" Persist{lm=, o=, n=[iXeAS] u="tailscale-test-nginx-ingr
ess-ingress.tail08b90.ts.net"}}
2023/08/30 16:58:21 logpolicy: using system state directory "/var/lib/tailscale"
2023/08/30 16:58:21 got LocalBackend in 94ms
2023/08/30 16:58:21 Start
2023/08/30 16:58:21 Backend: logs: be:2eaf67dd8b24335c674c3fe2a39446c74a82d97d6241f8fc50005481f87b6e60 fe:
2023/08/30 16:58:21 control: client.Login(false, 0)
2023/08/30 16:58:21 health("overall"): error: not in map poll
2023/08/30 16:58:21 control: doLogin(regen=false, hasUrl=false)
boot: 2023/08/30 16:58:21 tailscaled in state "NoState", waiting
2023/08/30 16:58:21 control: control server key from https://controlplane.tailscale.com: ts2021=[fSeS+], le
gacy=[nlFWp]
2023/08/30 16:58:21 control: RegisterReq: onode= node=[iXeAS] fup=false nks=false
2023/08/30 16:58:21 control: creating new noise client
2023/08/30 16:58:22 control: RegisterReq: got response; nodeKeyExpired=false, machineAuthorized=true; authU
RL=false
2023/08/30 16:58:22 control: netmap: got new dial plan from control
2023/08/30 16:58:22 active login: tailscale-test-nginx-ingress-ingress.tail08b90.ts.net
2023/08/30 16:58:22 Switching ipn state NoState -> Starting (WantRunning=true, nm=true)
2023/08/30 16:58:22 magicsock: SetPrivateKey called (init)
2023/08/30 16:58:22 wgengine: Reconfig: configuring userspace WireGuard config (with 0/4 peers)
2023/08/30 16:58:22 wgengine: Reconfig: configuring router
2023/08/30 16:58:22 wgengine: Reconfig: configuring DNS
2023/08/30 16:58:22 dns: Set: {DefaultResolvers:[] Routes:{} SearchDomains:[] Hosts:5}
2023/08/30 16:58:22 dns: Resolvercfg: {Routes:{} Hosts:5 LocalDomains:[]}
2023/08/30 16:58:22 dns: OScfg: {Nameservers:[] SearchDomains:[] MatchDomains:[] Hosts:[]}
2023/08/30 16:58:22 peerapi: serving on http://100.90.151.139:46081
2023/08/30 16:58:22 peerapi: serving on http://[fd7a:115c:a1e0:ab12:4843:cd96:625a:978b]:46081
boot: 2023/08/30 16:58:22 tailscaled in state "Starting", waiting
2023/08/30 16:58:22 magicsock: home is now derp-12 (ord)
2023/08/30 16:58:22 magicsock: adding connection to derp-12 for home-keep-alive
2023/08/30 16:58:22 magicsock: 1 active derp conns: derp-12=cr0s,wr0s
2023/08/30 16:58:22 Switching ipn state Starting -> Running (WantRunning=true, nm=true)
boot: 2023/08/30 16:58:22 Running 'tailscale set'
2023/08/30 16:58:22 derphttp.Client.Connect: connecting to derp-12 (ord)
2023/08/30 16:58:22 control: NetInfo: NetInfo{varies=false hairpin=false ipv6=false ipv6os=true udp=true ic
mpv4=false derp=#12 portmap= link="" firewallmode=""}
2023/08/30 16:58:22 magicsock: endpoints changed: 38.124.108.74:46861 (stun), 10.0.0.117:46861 (local)
2023/08/30 16:58:22 magicsock: derp-12 connected; connGen=1
2023/08/30 16:58:22 health("overall"): ok
boot: 2023/08/30 16:58:22 Deleting authkey from kube secret
boot: 2023/08/30 16:58:22 Startup complete, waiting for shutdown signal
2023/08/30 16:58:27 wgengine: idle peer [qMKBw] now active, reconfiguring WireGuard
2023/08/30 16:58:27 wgengine: Reconfig: configuring userspace WireGuard config (with 1/4 peers)
2023/08/30 16:58:27 magicsock: disco: node [qMKBw] d:b9b57e3beabb2e62 now using 192.168.1.88:41641
2023/08/30 16:58:27 Accept: TCP{100.106.218.33:63703 > 100.90.151.139:443} 64 tcp ok
2023/08/30 16:58:27 [unexpected] localbackend: got TCP conn without TCP config for port 443; from 100.106.2
18.33:63703
2023/08/30 16:58:27 netstack: could not connect to local server at 127.0.0.1:443: dial tcp 127.0.0.1:443: c
onnect: connection refused

i get this in the operator logs on first apply of the ingress manifest, but i'm a little confused by it because the ingress seems to show up correctly on the tailnet machine page of the ui:

{"level":"info","ts":"2023-08-30T16:53:51Z","logger":"ingress-reconciler","msg":"exposing ingress over tail
scale","ingress-ns":"tailscale-test","ingress-name":"nginx-ingress"}
{"level":"error","ts":"2023-08-30T16:53:56Z","msg":"Reconciler error","controller":"ingress","controllerGro
up":"networking.k8s.io","controllerKind":"Ingress","Ingress":{"name":"nginx-ingress","namespace":"tailscale
-test"},"namespace":"tailscale-test","name":"nginx-ingress","reconcileID":"1aaf55ea-3955-4130-9c72-a88e8a40
c618","error":"failed to update ingress status: Operation cannot be fulfilled on ingresses.networking.k8s.i
o \"nginx-ingress\": the object has been modified; please apply your changes to the latest version and try
again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\
n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller
-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.15.
0/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Contr
oller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226"}
{"level":"error","ts":"2023-08-30T16:53:57Z","msg":"Reconciler error","controller":"ingress","controllerGro
up":"networking.k8s.io","controllerKind":"Ingress","Ingress":{"name":"nginx-ingress","namespace":"tailscale
-test"},"namespace":"tailscale-test","name":"nginx-ingress","reconcileID":"f2125c00-ec43-41b1-859d-74ef8aa9
9c37","error":"failed to provision: failed to create or get API key secret: Operation cannot be fulfilled o
n secrets \"ts-nginx-ingress-2chnk-0\": the object has been modified; please apply your changes to the late
st version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller
).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324\nsig
s.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/contro
ller-runtime@v0.15.0/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal
/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/c
ontroller.go:226"}
{"level":"error","ts":"2023-08-30T16:53:57Z","msg":"Reconciler error","controller":"ingress","controllerGro
up":"networking.k8s.io","controllerKind":"Ingress","Ingress":{"name":"nginx-ingress","namespace":"tailscale
-test"},"namespace":"tailscale-test","name":"nginx-ingress","reconcileID":"247f19e8-dc8e-4df4-9c12-8f2294b2
85a1","error":"failed to update ingress status: Operation cannot be fulfilled on ingresses.networking.k8s.i
o \"nginx-ingress\": the object has been modified; please apply your changes to the latest version and try
again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\
n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller
-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.15.
0/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Contr
oller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226"}

and here's the secret request you asked for in https://github.com/tailscale/tailscale/issues/9141#issuecomment-1698239195 again:

$ kubectl -n tailscale get secret -l tailscale.com/parent-resource=nginx-ingress,tailscale.com/parent-resource-ns=tailscale-test,tailscale.com/parent-resource-type=ingress -o json | jq '.items[].data."serve-config"' -r | base64 -d

{"TCP":{"443":{"HTTPS":true}},"Web":{"${TS_CERT_DOMAIN}:443":{"Handlers":{"/":{"Proxy":"http://10.43.238.10:80/"}}}}}

i can tailscale ping from my laptop fine, and this is from my laptop

$ tailscale ping 100.90.151.139
pong from tailscale-test-nginx-ingress-ingress (100.90.151.139) via 192.168.1.234:7850 in 10ms
$ tailscale ip -4
100.106.218.33
$ tailscale status
tailscale status
100.106.218.33  jts-macbook-pro      jt@          macOS   -
100.107.12.100  ip-10-0-101-235      tagged-devices linux   -
100.70.39.26    rpi-tailscale-operator tagged-devices linux   -
100.93.232.87   rpi01                tagged-devices linux   -
100.90.151.139  tailscale-test-nginx-ingress-ingress tagged-devices linux   idle, tx 552 rx 344
maisem commented 1 year ago

oh! I see the problem, you need to enable HTTPS on your tailnet for ingress to work. I'll add an event when it is not

jtschelling commented 1 year ago

omg.....thank you worked immediately after enabling! i've been having great success with tailscale in general so thanks for all the work!