`kubectl logs` fails for pods on kilo nodes behind NAT

brianthelion commented 2 years ago

No errors on pod deployment; pods report READY and are, indeed, running when inspected directly on the node. wg kilo0 is up and working correctly.

The issue only seems to affect kubectl logs and kubectl exec.

Node and Master operating system are both Ubuntu 21.10. Kubernetes distro is kubeadm with flannel. Master has a public IP, Node does not due to being behind NAT.

kubectl apply -f https://raw.githubusercontent.com/squat/kilo/main/manifests/crds.yaml
kubectl apply -f https://raw.githubusercontent.com/squat/kilo/main/manifests/kilo-kubeadm-flannel.yaml

Annotations applied to Nodes:

kilo.squat.ai/location=${LOCATION}
kilo.squat.ai/persistent-keepalive=5

Annotations applied to Master:

kilo.squat.ai/location=master
kilo.squat.ai/force-endpoint=${PUBLIC_IP}:51820

And finally, kubectl logs {pod} and kubectl exec both report:

Error from server: Get "https://192.168.1.26:10250/containerLogs/default/{pod}": dial tcp 192.168.1.26:10250: i/o timeout

where 192.168.1.26 is the IP provided by the router, not the one provided by kilo.

Maybe some more configuration is needed?

brianthelion commented 2 years ago

One additional tidbit: I'm allocating and annotating the Node objects by hand before the kubelet joins the control plane (as opposed to letting the control plane do it automatically at join time).

leonnicolas commented 2 years ago

Can you ping pods on the nodes behind NAT from another location in the cluster? Aka is the pod network working?

brianthelion commented 2 years ago

@leonnicolas To test, I set up your adjacency service on a NodePort as follows:

kind: DaemonSet
apiVersion: apps/v1
metadata: 
  name: adjacency
  labels:
    app.kubernetes.io/name: adjacency
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: adjacency
  template:
    metadata:
      labels:
        app.kubernetes.io/name: adjacency
    spec:
      containers:
      - name: adjacency
        image: kiloio/adjacency
        args: 
        - --listen-address=:8080
        - --srv=_http._tcp.adjacency
        ports:
        - name: http
          containerPort: 8080
        livenessProbe:
          httpGet:
            path: /ping
            port: http
          failureThreshold: 2
          periodSeconds: 5
        startupProbe:
          httpGet:
            path: /ping
            port: http
          failureThreshold: 2
          periodSeconds: 5
---
kind: Service
apiVersion: v1
metadata:
  name: adjacency
  labels:
    app.kubernetes.io/name: adjacency
spec: 
  type: NodePort
  ports:
  - name: http
    port: 8080
    targetPort: http
    protocol: TCP
    nodePort: 30163
  selector:
    app.kubernetes.io/name: adjacency
#  clusterIP: None

curl ${PUBLIC_IP}:30163?format=json times out. This is presumably due to the timeout on the pods themselves, so I checked the adjacency pod logs:

$ kubectl get pods
NAME               READY   STATUS    RESTARTS   AGE
adjacency-2bqx2    1/1     Running   0          46s
adjacency-9cccf    1/1     Running   0          46s

$ kubectl logs adjacency-9cccf
2022/02/11 15:02:49 using timeout 10s, using probe timeout 2s
2022/02/11 15:02:49 listening on :8080

$ kubectl logs adjacency-2bqx2
Error from server: Get "https://192.168.1.26:10250/containerLogs/default/adjacency-2bqx2/adjacency": dial tcp 192.168.1.26:10250: i/o timeout

squat commented 2 years ago

@brianthelion I agree with @leonnicolas: it seems like cross-location networking is broken in general, not only logs. Can you please share the annotations on the "master" node? Id like to see if that node has a "discovered-endpoints" annotation that holds the endpoint for the NATed node

brianthelion commented 2 years ago

@squat @leonnicolas

$ kubectl get node godnode -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"6a:68:27:74:eb:55"}'
    flannel.alpha.coreos.com/backend-type: vxlan
    flannel.alpha.coreos.com/kube-subnet-manager: "true"
    flannel.alpha.coreos.com/public-ip: 172.31.22.109
    kilo.squat.ai/discovered-endpoints: '{"kBzR7wVzofsxVQmcxeHNhq46ELILmsQcQvfp13EttwM=":{"IP":"174.58.68.199","Port":51820,"Zone":""},"kCyzlykdef9T1ieHS98a4v10SFU70WjmAuagM51KOEs=":{"IP":"71.233.150.67","Port":51820,"Zone":""}}'
    kilo.squat.ai/endpoint: 54.164.48.95:51820
    kilo.squat.ai/force-endpoint: 54.164.48.95:51820
    kilo.squat.ai/granularity: location
    kilo.squat.ai/internal-ip: 172.31.22.109/20
    kilo.squat.ai/key: f5Q8OunSYZIZ886PXwKpmgurPi5ecnqEpdv+UW0Dowc=
    kilo.squat.ai/last-seen: "1644593215"
    kilo.squat.ai/location: godnode
    kilo.squat.ai/persistent-keepalive: "5"
    kilo.squat.ai/wireguard-ip: 10.4.0.3/16
    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2022-02-10T23:38:35Z"
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: godnode
    kubernetes.io/os: linux
    node-role.kubernetes.io/control-plane: ""
    node-role.kubernetes.io/master: ""
    node.kubernetes.io/exclude-from-external-load-balancers: ""
  name: godnode
  resourceVersion: "79939"
  uid: 13d6206f-52e7-45f5-88a1-626a5dd94710

brianthelion commented 2 years ago

Interestingly, pod-to-pod ping seems to work fine:

$ kubectl get pod adjacency-2bqx2 -o yaml | fgrep podIP:
  podIP: 10.244.3.6

$ kubectl exec -it adjacency-9cccf -- sh 
/ # ping 10.244.3.6
PING 10.244.3.6 (10.244.3.6): 56 data bytes
64 bytes from 10.244.3.6: seq=0 ttl=62 time=41.018 ms
64 bytes from 10.244.3.6: seq=1 ttl=62 time=39.190 ms
64 bytes from 10.244.3.6: seq=2 ttl=62 time=43.771 ms
64 bytes from 10.244.3.6: seq=3 ttl=62 time=39.341 ms
64 bytes from 10.244.3.6: seq=9 ttl=62 time=39.391 ms
64 bytes from 10.244.3.6: seq=10 ttl=62 time=39.253 ms
64 bytes from 10.244.3.6: seq=11 ttl=62 time=39.734 ms
64 bytes from 10.244.3.6: seq=12 ttl=62 time=41.157 ms
64 bytes from 10.244.3.6: seq=13 ttl=62 time=52.166 ms
64 bytes from 10.244.3.6: seq=14 ttl=62 time=40.461 ms
64 bytes from 10.244.3.6: seq=15 ttl=62 time=39.335 ms
64 bytes from 10.244.3.6: seq=16 ttl=62 time=38.885 ms
64 bytes from 10.244.3.6: seq=17 ttl=62 time=40.956 ms
64 bytes from 10.244.3.6: seq=18 ttl=62 time=39.101 ms
64 bytes from 10.244.3.6: seq=19 ttl=62 time=40.095 ms
64 bytes from 10.244.3.6: seq=20 ttl=62 time=41.950 ms
^C
--- 10.244.3.6 ping statistics ---
21 packets transmitted, 16 packets received, 23% packet loss
round-trip min/avg/max = 38.885/40.987/52.166 ms
/ #

There's packet loss and a bit of jitter, but nothing catastrophic.

brianthelion commented 2 years ago

And, indeed, if I log directly into the NATted node's console and use docker exec to get into the adjacency-2bqx2 container, I can ping adjacency-9cccf at its podIP.

brianthelion commented 2 years ago

Further consideration after slack discussion: The Master routing table does not contain an entry for 192.168.1.26:

$ ip route show
default via 172.31.16.1 dev ens5 proto dhcp src 172.31.22.109 metric 100 
10.4.0.0/16 dev kilo0 proto kernel scope link src 10.4.0.3 
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1 
10.244.3.0/24 via 10.4.0.2 dev kilo0 proto static onlink 
10.244.4.0/24 via 10.4.0.1 dev kilo0 proto static onlink 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
172.31.16.0/20 dev ens5 proto kernel scope link src 172.31.22.109 
172.31.16.1 dev ens5 proto dhcp scope link src 172.31.22.109 metric 100

Adding a static route solves the problem:

$ sudo ip route add 192.168.1.26 via 10.4.0.2 dev kilo0

$ kubectl logs adjacency-9cccf
2022/02/11 15:02:49 using timeout 10s, using probe timeout 2s
2022/02/11 15:02:49 listening on :8080

$ kubectl logs adjacency-2bqx2
2022/02/11 15:02:53 using timeout 10s, using probe timeout 2s
2022/02/11 15:02:53 listening on :8080

Are there additional annotations necessary to get kilo to update the routing table automatically?

brianthelion commented 2 years ago

More observations:

Adding a third Node causes the routing table on Master to get flushed, thereby removing my manual entry and putting kubectl logs {pod} back into the timeout error mode on the NATted nodes.
Adding a third Node also kills my ssh connection to the Master, presumably because kilo is doing something to the iptables rules.

squat commented 2 years ago

It seems like the problem is that Kilo is discovering the wrong IP as the internal IP/interface on all of the nodes. Kilo is finding IPs like 172.31.22.109/20 but the private IP you wanted is 192.168.1.26, this is why kubectl logs is not working. Presumably, pod-to-pod networking using Host-Ports should also not work as a result (when cross location).

brianthelion commented 2 years ago

@squat No, 172.31.22.109 is the (correct) private IP of the Master VM; its public address is in the 54.x.y.z range. As is typical with cloud services, the VM doesn't have any knowledge of its own public IP.

$ ip -br addr show
lo               UNKNOWN        127.0.0.1/8 ::1/128 
ens5             UP             172.31.22.109/20 fe80::8da:66ff:feb8:4807/64 
docker0          UP             172.17.0.1/16 fe80::42:64ff:fe02:eb24/64 
...
flannel.1        UNKNOWN        10.244.0.0/32 fe80::6868:27ff:fe74:eb55/64 
cni0             UP             10.244.0.1/24 fe80::7492:67ff:fe84:5df3/64 
kilo0            UNKNOWN        10.4.0.2/16 
...

Meanwhile on the "problem" Node:

$ ip -br addr show
lo               UNKNOWN        127.0.0.1/8 ::1/128 
dlan0            UP             fe80::dea6:32ff:fe20:7203/64 
wlan0            UP             192.168.1.26/24 fe80::dea6:32ff:fe20:7204/64 
docker0          DOWN           172.17.0.1/16 
kilo0            UNKNOWN        10.4.0.1/16 
flannel.1        UNKNOWN        10.244.3.0/32 fe80::5c1a:41ff:fe64:403b/64 
cni0             UP             10.244.3.1/24 fe80::5c2e:f0ff:fe67:eb9a/64 
...

squat commented 2 years ago

Ok great, thanks for that 👍 can you please share the annotations on the problem node?

brianthelion commented 2 years ago

@squat

$ kubectl get node 60a0eb2257af77b3 -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"5e:1a:41:64:40:3b"}'
    flannel.alpha.coreos.com/backend-type: vxlan
    flannel.alpha.coreos.com/kube-subnet-manager: "true"
    flannel.alpha.coreos.com/public-ip: 192.168.1.26
    kilo.squat.ai/discovered-endpoints: '{"f5Q8OunSYZIZ886PXwKpmgurPi5ecnqEpdv+UW0Dowc=":{"IP":"54.164.48.95","Port":51820,"Zone":""}}'
    kilo.squat.ai/endpoint: 192.168.1.26:51820
    kilo.squat.ai/granularity: location
    kilo.squat.ai/internal-ip: 192.168.1.26/24
    kilo.squat.ai/key: kBzR7wVzofsxVQmcxeHNhq46ELILmsQcQvfp13EttwM=
    kilo.squat.ai/last-seen: "1644676967"
    kilo.squat.ai/location: brf0.com/lake
    kilo.squat.ai/persistent-keepalive: "5"
    kilo.squat.ai/wireguard-ip: 10.4.0.1/16
    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2022-02-10T23:50:04Z"
  labels:
    beta.kubernetes.io/arch: arm64
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: arm64
    kubernetes.io/hostname: 60a0eb2257af77b3
    kubernetes.io/os: linux
  name: 60a0eb2257af77b3
spec:
  podCIDR: 10.244.3.0/24
  podCIDRs:
  - 10.244.3.0/24
status:
  addresses:
  - address: 192.168.1.26
    type: InternalIP
  - address: 60a0eb2257af77b3
    type: Hostname
...

brianthelion commented 2 years ago

@squat @leonnicolas Any thoughts on this? It seems to be very reproducible for us by:

Master: kubeadm reset --force.
Master: kubeadm init --config=.
Master: Apply flannel and kilo-flannel.
Master: Apply kilo annotations.
Master: kubeadm token create --print-join-command.
Problem node: kubeadm reset --force.
Problem node: kubeadm join with the bootstrap token.
Problem node: Apply kilo annotations.
Client: kubeadm logs targeting the problem node.

Could it be that some kilo state isn't being flushed by kubeadm reset --force?

squat commented 2 years ago

@brianthelion can you please share some details about what the topology looks like? which nodes are co-located? which ones are expected to communicate over vpn, etc. Also, could you please try running the latest Kilo commit? We recently merged some fixes that addressed problems cross-location communication in certain scenarios: squat/kilo:d95e590f5c24900d4688b67dc92ccd0298948006

hhstu commented 2 years ago

@brianthelion can you please share some details about what the topology looks like? which nodes are co-located? which ones are expected to communicate over vpn, etc. Also, could you please try running the latest Kilo commit? We recently merged some fixes that addressed problems cross-location communication in certain scenarios: squat/kilo:d95e590f5c24900d4688b67dc92ccd0298948006

@squat can you @ the commit abourt squat/kilo:d95e590f5c24900d4688b67dc92ccd0298948006,i can`t find it

squat commented 2 years ago

Hi @hhstu this was the PR in question: https://github.com/squat/kilo/pull/285

Using the :latest image tag or some tag like :d95e590f5c24900d4688b67dc92ccd0298948006 [0] should contain this patch

0: https://hub.docker.com/layers/kilo/squat/kilo/d95e590f5c24900d4688b67dc92ccd0298948006/images/sha256-0738723b57968c2415a005f695cddaad44a88789dbd5a7ced6ba0459bb4fdcde?context=explore

squat commented 2 years ago

Hi @hhstu please try the latest Kilo release, 0.5.0 and let me know if you are able to make progress :) We have recently fixed several bugs and improved support for nodes behind NAT.

hhstu commented 2 years ago

I use this project for kubectl logs https://github.com/hhstu/kube-node-dns @squat

nodename -> ip of wg0

stv0g commented 1 year ago

This is a duplicate of #189. I think we should close this issue.

squat commented 1 year ago

Thanks @stv0g :) closing

squat / kilo

`kubectl logs` fails for pods on kilo nodes behind NAT #272