siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.85k stars 549 forks source link

KubeSpan should not announce 'externalIP' addresses as local to the node #7126

Closed smira closed 1 year ago

smira commented 1 year ago

Reported by @steverfrancis, GCP VM reports externalIP as routable, but it's not assigned to the VM, so the traffic gets dropped.

smira commented 1 year ago

I can't reproduce, waiting for more info from Steve.

steverfrancis commented 1 year ago

To reproduce: Create GCP machine (I use region and zone us-central1, the default), type E2 medium 4GB. Boot disk is Custom Image, talos 1.3.7 (that I uploaded the GCP image and made an image of.) Firewall is set to allow all traffic inbound, on all ports.

Create the CP machine. Also create a machine in Vultr (again in Chicago, 4GB of memory, talos 1.3.7 image.)

I have a quick script to automate the steps:

 more ~/createcluster.sh 
#!/bin/bash
echo talosctl gen config mycluster5 https://$1:6443
talosctl gen config mycluster5 https://$1:6443 --with-kubespan
#cat controlplane.yaml | sed 's/sda/vda/' > ./tmp.foo ; mv ./tmp.foo ./controlplane.yaml
talosctl apply-config --insecure --nodes $1 -e $1 --file ./controlplane.yaml
read -p "When reached waiting for bootstrap...Press enter to continue"
talosctl bootstrap --nodes $1 -e $1 --talosconfig=./talosconfig
talosctl health --nodes $1 -e $1 --talosconfig=./talosconfig
cat worker.yaml | sed 's/sda/vda/' > ./tmp.foo ; mv ./tmp.foo ./worker.yaml
talosctl apply-config --insecure --nodes $2 -e $2 -f worker.yaml

~/createcluster.sh 34.172.164.165 207.148.10.224 The control plane is created (seemingly) happily; the worker is created, but loops and reboots with "unable to read client-cert /var/lib/kubelet/pki/kubelet-client-current.pem"

Asking the worker it's view of kubespan peers:

talosctl --talosconfig=./talosconfig -n 207.148.10.224 -e 34.172.164.165 get kubespanpeerspecs -o yaml
node: 207.148.10.224
metadata:
    namespace: kubespan
    type: KubeSpanPeerSpecs.kubespan.talos.dev
    id: VLSiU0XMbEOnCx313disAn280PkClUyttIzrZ0DzanA=
    version: 1
    owner: kubespan.PeerSpecController
    phase: running
    created: 2023-04-25T20:52:10Z
    updated: 2023-04-25T20:52:10Z
spec:
    address: fd0e:abd0:86c5:3302:4001:aff:fe80:22
    allowedIPs:
        - 10.128.0.34/32
        - fd0e:abd0:86c5:3302:4001:aff:fe80:22/128
    endpoints:
        - 10.128.0.34:51820
        - 34.172.164.165:51820
    label: cp1

and

talosctl --talosconfig=./talosconfig -n 207.148.10.224 -e 34.172.164.165 get kubespanpeerstatuses -o yaml
node: 207.148.10.224
metadata:
    namespace: kubespan
    type: KubeSpanPeerStatuses.kubespan.talos.dev
    id: VLSiU0XMbEOnCx313disAn280PkClUyttIzrZ0DzanA=
    version: 9
    owner: kubespan.ManagerController
    phase: running
    created: 2023-04-25T20:52:10Z
    updated: 2023-04-25T20:56:08Z
spec:
    endpoint: 34.172.164.165:51820
    label: cp1
    state: up
    receiveBytes: 42424
    transmitBytes: 98972
    lastHandshakeTime: 2023-04-25T20:54:12.668256405Z
    lastUsedEndpoint: 10.128.0.34:51820
    lastEndpointChange: 2023-04-25T20:52:10.570074212Z
steverfrancis commented 1 year ago

OK, to confuse mattesr more: I can join a worker successfully from my QEMU environment, using the exact same worker.yaml file to that same cluster. So here is all the kubespan info from the 3 different nodes: From the CP:

talosctl --talosconfig=./talosconfig -n 34.172.164.165 -e 34.172.164.165 get kubespanidentities -o yaml
node: 34.172.164.165
metadata:
    namespace: kubespan
    type: KubeSpanIdentities.kubespan.talos.dev
    id: local
    version: 1
    owner: kubespan.IdentityController
    phase: running
    created: 2023-04-25T20:42:06Z
    updated: 2023-04-25T20:42:06Z
spec:
    address: fd0e:abd0:86c5:3302:4001:aff:fe80:22/128
    subnet: fd0e:abd0:86c5:3302::/64
    privateKey: QIbmNTYH1A9RHdkB3bNd4+Tq/8tqRG5kKP21sCy6yW0=
    publicKey: VLSiU0XMbEOnCx313disAn280PkClUyttIzrZ0DzanA=
talosctl --talosconfig=./talosconfig -n 34.172.164.165 -e 34.172.164.165 get kubespanpeerspecs -o yaml
node: 34.172.164.165
metadata:
    namespace: kubespan
    type: KubeSpanPeerSpecs.kubespan.talos.dev
    id: RqYPzj6DwK4213+ra3hC8roP3ibpU1vAIHDcRMHIwUw=
    version: 3
    owner: kubespan.PeerSpecController
    phase: running
    created: 2023-04-25T20:45:53Z
    updated: 2023-04-25T20:52:11Z
spec:
    address: fd0e:abd0:86c5:3302:5400:4ff:fe69:c6e1
    allowedIPs:
        - 207.148.10.224/32
        - 2001:19f0:5c01:e65:5400:4ff:fe69:c6e1/128
        - fd0e:abd0:86c5:3302:5400:4ff:fe69:c6e1/128
    endpoints:
        - 207.148.10.224:51820
        - '[2001:19f0:5c01:e65:5400:4ff:fe69:c6e1]:51820'
    label: talos-d5z-q20
---
node: 34.172.164.165
metadata:
    namespace: kubespan
    type: KubeSpanPeerSpecs.kubespan.talos.dev
    id: m1tVu5Jq1pO66eArggx6sHqnXwJZQrqlUiHlmnr7+Sc=
    version: 4
    owner: kubespan.PeerSpecController
    phase: running
    created: 2023-04-25T21:00:07Z
    updated: 2023-04-25T21:00:10Z
spec:
    address: fd0e:abd0:86c5:3302:7821:e8ff:fe2e:a068
    allowedIPs:
        - 192.168.64.32/32
        - fd0e:abd0:86c5:3302:7821:e8ff:fe2e:a068/128
        - fd2a:59c8:2c5f:e2bc:7821:e8ff:fe2e:a068/128
    endpoints:
        - 192.168.64.32:51820
        - 98.97.60.64:51820
        - '[fd2a:59c8:2c5f:e2bc:7821:e8ff:fe2e:a068]:51820'
        - 98.97.60.64:20797
    label: talos-3ku-r07
talosctl --talosconfig=./talosconfig -n 34.172.164.165 -e 34.172.164.165 get kubespanpeerstatuses -o yaml
node: 34.172.164.165
metadata:
    namespace: kubespan
    type: KubeSpanPeerStatuses.kubespan.talos.dev
    id: RqYPzj6DwK4213+ra3hC8roP3ibpU1vAIHDcRMHIwUw=
    version: 52
    owner: kubespan.ManagerController
    phase: running
    created: 2023-04-25T20:45:53Z
    updated: 2023-04-25T21:09:05Z
spec:
    endpoint: 207.148.10.224:51820
    label: talos-d5z-q20
    state: up
    receiveBytes: 345488
    transmitBytes: 202048
    lastHandshakeTime: 2023-04-25T21:08:56.369018534Z
    lastUsedEndpoint: 207.148.10.224:51820
    lastEndpointChange: 2023-04-25T20:45:53.883001571Z
---
node: 34.172.164.165
metadata:
    namespace: kubespan
    type: KubeSpanPeerStatuses.kubespan.talos.dev
    id: m1tVu5Jq1pO66eArggx6sHqnXwJZQrqlUiHlmnr7+Sc=
    version: 20
    owner: kubespan.ManagerController
    phase: running
    created: 2023-04-25T21:00:07Z
    updated: 2023-04-25T21:09:05Z
spec:
    endpoint: 98.97.60.64:20797
    label: talos-3ku-r07
    state: up
    receiveBytes: 153492
    transmitBytes: 86888
    lastHandshakeTime: 2023-04-25T21:08:59.276731883Z
    lastUsedEndpoint: 98.97.60.64:51820
    lastEndpointChange: 2023-04-25T21:00:35.757643317Z
talosctl --talosconfig=./talosconfig -n 34.172.164.165 -e 34.172.164.165 get KubeSpanEndpoints -o yaml
node: 34.172.164.165
metadata:
    namespace: kubespan
    type: KubeSpanEndpoints.kubespan.talos.dev
    id: RqYPzj6DwK4213+ra3hC8roP3ibpU1vAIHDcRMHIwUw=
    version: 1
    owner: kubespan.EndpointController
    phase: running
    created: 2023-04-25T20:46:05Z
    updated: 2023-04-25T20:46:05Z
spec:
    affiliateID: dyyN94Brr2v9rPHcfoxB0h0YHOHI4eNSKEskSdb5gNa
    endpoint: 207.148.10.224:51820
---
node: 34.172.164.165
metadata:
    namespace: kubespan
    type: KubeSpanEndpoints.kubespan.talos.dev
    id: m1tVu5Jq1pO66eArggx6sHqnXwJZQrqlUiHlmnr7+Sc=
    version: 1
    owner: kubespan.EndpointController
    phase: running
    created: 2023-04-25T21:01:05Z
    updated: 2023-04-25T21:01:05Z
spec:
    affiliateID: PBMObXq0pLZCHP1hXv70nfJwL3voAXuqDV7rddoAzrd
    endpoint: 98.97.60.64:20797

From the Vultr worker that cannot successfully join:

talosctl --talosconfig=./talosconfig -n 207.148.10.224 -e 34.172.164.165 get KubeSpanIdentities -o yaml
node: 207.148.10.224
metadata:
    namespace: kubespan
    type: KubeSpanIdentities.kubespan.talos.dev
    id: local
    version: 2
    owner: kubespan.IdentityController
    phase: running
    created: 2023-04-25T21:10:50Z
    updated: 2023-04-25T21:10:50Z
spec:
    address: fd0e:abd0:86c5:3302:5400:4ff:fe69:c6e1/128
    subnet: fd0e:abd0:86c5:3302::/64
    privateKey: YAuNKGn5zS0BhvFLtttUfs0Sb859UMzOyOdDpWzGl1w=
    publicKey: RqYPzj6DwK4213+ra3hC8roP3ibpU1vAIHDcRMHIwUw=
 talosctl --talosconfig=./talosconfig -n 207.148.10.224 -e 34.172.164.165 get KubeSpanPeerSpecs -o yaml 
node: 207.148.10.224
metadata:
    namespace: kubespan
    type: KubeSpanPeerSpecs.kubespan.talos.dev
    id: VLSiU0XMbEOnCx313disAn280PkClUyttIzrZ0DzanA=
    version: 1
    owner: kubespan.PeerSpecController
    phase: running
    created: 2023-04-25T21:10:57Z
    updated: 2023-04-25T21:10:57Z
spec:
    address: fd0e:abd0:86c5:3302:4001:aff:fe80:22
    allowedIPs:
        - 10.128.0.34/32
        - fd0e:abd0:86c5:3302:4001:aff:fe80:22/128
    endpoints:
        - 10.128.0.34:51820
        - 34.172.164.165:51820
    label: cp1
---
node: 207.148.10.224
metadata:
    namespace: kubespan
    type: KubeSpanPeerSpecs.kubespan.talos.dev
    id: m1tVu5Jq1pO66eArggx6sHqnXwJZQrqlUiHlmnr7+Sc=
    version: 1
    owner: kubespan.PeerSpecController
    phase: running
    created: 2023-04-25T21:10:57Z
    updated: 2023-04-25T21:10:57Z
spec:
    address: fd0e:abd0:86c5:3302:7821:e8ff:fe2e:a068
    allowedIPs:
        - 192.168.64.32/32
        - fd0e:abd0:86c5:3302:7821:e8ff:fe2e:a068/128
        - fd2a:59c8:2c5f:e2bc:7821:e8ff:fe2e:a068/128
    endpoints:
        - 192.168.64.32:51820
        - 98.97.60.64:51820
        - '[fd2a:59c8:2c5f:e2bc:7821:e8ff:fe2e:a068]:51820'
        - 98.97.60.64:20797
    label: talos-3ku-r07
talosctl --talosconfig=./talosconfig -n 207.148.10.224 -e 34.172.164.165 get KubeSpanPeerStatuses -o yaml
node: 207.148.10.224
metadata:
    namespace: kubespan
    type: KubeSpanPeerStatuses.kubespan.talos.dev
    id: VLSiU0XMbEOnCx313disAn280PkClUyttIzrZ0DzanA=
    version: 4
    owner: kubespan.ManagerController
    phase: running
    created: 2023-04-25T21:10:57Z
    updated: 2023-04-25T21:11:50Z
spec:
    endpoint: 34.172.164.165:51820
    label: cp1
    state: up
    receiveBytes: 23268
    transmitBytes: 145268
    lastHandshakeTime: 2023-04-25T21:11:01.861191993Z
    lastUsedEndpoint: 10.128.0.34:51820
    lastEndpointChange: 2023-04-25T21:10:57.943104301Z
---
node: 207.148.10.224
metadata:
    namespace: kubespan
    type: KubeSpanPeerStatuses.kubespan.talos.dev
    id: m1tVu5Jq1pO66eArggx6sHqnXwJZQrqlUiHlmnr7+Sc=
    version: 4
    owner: kubespan.ManagerController
    phase: running
    created: 2023-04-25T21:10:57Z
    updated: 2023-04-25T21:11:50Z
spec:
    endpoint: 98.97.60.64:20797
    label: talos-3ku-r07
    state: up
    receiveBytes: 212
    transmitBytes: 864
    lastHandshakeTime: 2023-04-25T21:11:22.612473121Z
    lastUsedEndpoint: 98.97.60.64:51820
    lastEndpointChange: 2023-04-25T21:11:20.617404565Z
talosctl --talosconfig=./talosconfig -n 207.148.10.224 -e 34.172.164.165 get KubeSpanEndpoints -o yaml   
node: 207.148.10.224
metadata:
    namespace: kubespan
    type: KubeSpanEndpoints.kubespan.talos.dev
    id: VLSiU0XMbEOnCx313disAn280PkClUyttIzrZ0DzanA=
    version: 1
    owner: kubespan.EndpointController
    phase: running
    created: 2023-04-25T21:11:20Z
    updated: 2023-04-25T21:11:20Z
spec:
    affiliateID: vRHCfbX8scf32xYA6zVxkkk2zGFPZR70cjrqu1i5ntx
    endpoint: 34.172.164.165:51820
---
node: 207.148.10.224
metadata:
    namespace: kubespan
    type: KubeSpanEndpoints.kubespan.talos.dev
    id: m1tVu5Jq1pO66eArggx6sHqnXwJZQrqlUiHlmnr7+Sc=
    version: 1
    owner: kubespan.EndpointController
    phase: running
    created: 2023-04-25T21:11:50Z
    updated: 2023-04-25T21:11:50Z
spec:
    affiliateID: PBMObXq0pLZCHP1hXv70nfJwL3voAXuqDV7rddoAzrd
    endpoint: 98.97.60.64:20797

And finally from the QEMU machine (that is behind more firewalls) but joins the cluster fine:

 talosctl --talosconfig=./talosconfig -n 192.168.64.32 -e 34.172.164.165 get KubeSpanIdentities -o yaml
node: 192.168.64.32
metadata:
    namespace: kubespan
    type: KubeSpanIdentities.kubespan.talos.dev
    id: local
    version: 1
    owner: kubespan.IdentityController
    phase: running
    created: 2023-04-25T21:00:07Z
    updated: 2023-04-25T21:00:07Z
spec:
    address: fd0e:abd0:86c5:3302:7821:e8ff:fe2e:a068/128
    subnet: fd0e:abd0:86c5:3302::/64
    privateKey: uLCRemKvgTSMBu9oxFuqetxcq+tlzlKmeOLiIBc+MnE=
    publicKey: m1tVu5Jq1pO66eArggx6sHqnXwJZQrqlUiHlmnr7+Sc=
 talosctl --talosconfig=./talosconfig -n 192.168.64.32 -e 34.172.164.165 get KubeSpanPeerSpecs -o yaml 
node: 192.168.64.32
metadata:
    namespace: kubespan
    type: KubeSpanPeerSpecs.kubespan.talos.dev
    id: RqYPzj6DwK4213+ra3hC8roP3ibpU1vAIHDcRMHIwUw=
    version: 1
    owner: kubespan.PeerSpecController
    phase: running
    created: 2023-04-25T21:00:08Z
    updated: 2023-04-25T21:00:08Z
spec:
    address: fd0e:abd0:86c5:3302:5400:4ff:fe69:c6e1
    allowedIPs:
        - 207.148.10.224/32
        - 2001:19f0:5c01:e65:5400:4ff:fe69:c6e1/128
        - fd0e:abd0:86c5:3302:5400:4ff:fe69:c6e1/128
    endpoints:
        - 207.148.10.224:51820
        - '[2001:19f0:5c01:e65:5400:4ff:fe69:c6e1]:51820'
    label: talos-d5z-q20
---
node: 192.168.64.32
metadata:
    namespace: kubespan
    type: KubeSpanPeerSpecs.kubespan.talos.dev
    id: VLSiU0XMbEOnCx313disAn280PkClUyttIzrZ0DzanA=
    version: 1
    owner: kubespan.PeerSpecController
    phase: running
    created: 2023-04-25T21:00:08Z
    updated: 2023-04-25T21:00:08Z
spec:
    address: fd0e:abd0:86c5:3302:4001:aff:fe80:22
    allowedIPs:
        - 10.128.0.34/32
        - fd0e:abd0:86c5:3302:4001:aff:fe80:22/128
    endpoints:
        - 10.128.0.34:51820
        - 34.172.164.165:51820
    label: cp1
talosctl --talosconfig=./talosconfig -n 192.168.64.32 -e 34.172.164.165 get KubeSpanPeerStatuses -o yaml
node: 192.168.64.32
metadata:
    namespace: kubespan
    type: KubeSpanPeerStatuses.kubespan.talos.dev
    id: RqYPzj6DwK4213+ra3hC8roP3ibpU1vAIHDcRMHIwUw=
    version: 30
    owner: kubespan.ManagerController
    phase: running
    created: 2023-04-25T21:00:08Z
    updated: 2023-04-25T21:14:37Z
spec:
    endpoint: 207.148.10.224:51820
    label: talos-d5z-q20
    state: up
    receiveBytes: 932
    transmitBytes: 2684
    lastHandshakeTime: 2023-04-25T21:13:35.685424626Z
    lastUsedEndpoint: 207.148.10.224:51820
    lastEndpointChange: 2023-04-25T21:00:08.243702418Z
---
node: 192.168.64.32
metadata:
    namespace: kubespan
    type: KubeSpanPeerStatuses.kubespan.talos.dev
    id: VLSiU0XMbEOnCx313disAn280PkClUyttIzrZ0DzanA=
    version: 30
    owner: kubespan.ManagerController
    phase: running
    created: 2023-04-25T21:00:08Z
    updated: 2023-04-25T21:14:37Z
spec:
    endpoint: 34.172.164.165:51820
    label: cp1
    state: up
    receiveBytes: 113208
    transmitBytes: 312904
    lastHandshakeTime: 2023-04-25T21:13:00.880998522Z
    lastUsedEndpoint: 34.172.164.165:51820
    lastEndpointChange: 2023-04-25T21:00:37.3356344Z
talosctl --talosconfig=./talosconfig -n 192.168.64.32 -e 34.172.164.165 get KubeSpanEndpoints -o yaml   
node: 192.168.64.32
metadata:
    namespace: kubespan
    type: KubeSpanEndpoints.kubespan.talos.dev
    id: RqYPzj6DwK4213+ra3hC8roP3ibpU1vAIHDcRMHIwUw=
    version: 1
    owner: kubespan.EndpointController
    phase: running
    created: 2023-04-25T21:00:37Z
    updated: 2023-04-25T21:00:37Z
spec:
    affiliateID: dyyN94Brr2v9rPHcfoxB0h0YHOHI4eNSKEskSdb5gNa
    endpoint: 207.148.10.224:51820
---
node: 192.168.64.32
metadata:
    namespace: kubespan
    type: KubeSpanEndpoints.kubespan.talos.dev
    id: VLSiU0XMbEOnCx313disAn280PkClUyttIzrZ0DzanA=
    version: 1
    owner: kubespan.EndpointController
    phase: running
    created: 2023-04-25T21:01:07Z
    updated: 2023-04-25T21:01:07Z
spec:
    affiliateID: vRHCfbX8scf32xYA6zVxkkk2zGFPZR70cjrqu1i5ntx
    endpoint: 34.172.164.165:51820
smira commented 1 year ago

I don't see 34.172.164.165 (which is GCP external IP I believe) anywhere in the allowedIPs which means that the traffic towards 34.172.164.165 is not routed over KubeSpan, which in turn means that the problem is not what I first thought about.

smira commented 1 year ago

So the problem is the following:

When KubeSpan is enabled, traffic between the nodes is routed over KubeSpan. As Google public address is not actually assigned to the machine, it's not announced over KubeSpan.

GCP VM external IP is used as the cluster control plane endpoint.

The problem is asymmetric routing:

  1. Vultr worker tries to contact the control plane endpoint (GCP Public IP).
  2. A packet leaves the Vultr VM as: src - Vultr Public IP, dst - GCP Public IP.
  3. As GCP Public IP is not announced over KubeSpan, the packet goes out over public Internet.
  4. The packets reaches Google infra, where it gets rewritten into: Vultr Public IP -> GCP Private IP.
  5. The packets is received by the GCP VM, and GCP VM sends the response: GCP Private IP -> Vultr Public IP.
  6. As the Vultr Public IP is announced over KubeSpan, the response packet goes over KubeSpan.
  7. The packet reaches Vultr VM as GCP Private IP -> Vultr Public IP, and it doesn't match the first packet src/dst, so it doesn't get attached to the TCP connection.
  8. Vultr VM never sees a proper response to the initial packet, so the connection times out (on TCP level).

What we can do:

  1. Use GCP private IP as the control plane endpoint for the cluster, as private IP is KubeSpan'd, it would work.
  2. Use Vultr VM as the control plane, and GCP as a worker.

What we can't do:

  1. Announce GCP Public IP over KubeSpan - this will break things, as the IP is not assigned to the VM; if we try to assign, it will break GCP networking.

Why it works for worker nodes in the home environment:

In the home environment, QEMU VM announces its private IP over KubeSpan. Packet same way goes out from the QEMU VM to the GCP Public IP over public Internet. The packet on the way out gets rewritten by NAT(s) to have the source IP to be the public IP of your home router. As it reaches GCP, it doesn't have original private QEMU IP, so on the way back from GCP it will go over public Internet as well and reach successfully QEMU VM.

In this case we can't do much except for documenting the issue. AFAIK only AWS and GCP map public IPs this way.

steverfrancis commented 1 year ago

Thanks for the research and write up. Makes sense. To paraphrase, the Issue is that google rewrites the public IP to the private IP, so the google machine replies with the source of the private IP within kubespan, but that wasn't what the worker attempted to connect to. Qemu machine is immune to this issue as the GCP node doesn't see the packet arriving from a Kubespan address, so

So this issue will only occur when the K8s endpoint is an IP address that is not part of Kubespan, but is an address that is forwarded on to the Kubespan address of a control plane node, without changing the source address. In such a case, the control plane has no way to determine whether the packet arrived on the private Kubespan address, or the public IP address. If the source of the packet was a Kubespan member, the reply will be Kubespan encapsulated, and thus not translated to the public IP, and so the control plane will reply to the session with the wrong address.

Correct me if I'm wrong...

smira commented 1 year ago

yes, you're right. One more workaround for this case is to add a load-balancer in front of the GCP VM. This way traffic coming to the GCP VM Kubernetes API server will appear as coming from a LB, not a Vultr Public IP.