Open stevefan1999-personal opened 4 months ago
I think the problem why kubespans advertiseKubernetesNetworks
is not working together with cilium's nativeRouting is because kubespan tries to get the podIPs it should route from the container network interface list (ip a
basically) , which when using cilium, doesn't really have the podIP assigned directly to the interface. It would need to get the node's specific podCIDR and route the entire (/24 by default) net.
I ended up implementing this, but not via kubespan, but via a tailscale extension, but I tried it over kubespan before that too.
KubeSpan takes care of node-to-node traffic, while CNI should take care of pod-to-pod traffic (and convert it to node-to-node traffic). Native routing makes sense when nodes are directly connected, so no need to use KubeSpan.
KubeSpan takes care of node-to-node traffic, while CNI should take care of pod-to-pod traffic (and convert it to node-to-node traffic). Native routing makes sense when nodes are directly connected, so no need to use KubeSpan.
KubeSpan is sure needed if you run behind NAT gateway for multiple cloud as a hybrid solution :)
It's simply way too expensive to run VPN Gateway and manually plan out networking for that. We could just leverage KubeSpan's ability to let pod traffic go through node-to-node and thus cloud-to-cloud.
Sure CNI would still take care of pod-to-pod address allocation (the usual ip route add <pod/cidr> via <kubespan-ip> dev kubespan
stuff), but node-to-node routing using KubeSpan is denied because the pod CIDR is not allowed in the Wireguard config.
Sorry for (kinda) off topic comment, but @Preisschild you mentioned:
I ended up implementing this, but not via kubespan, but via a tailscale extension, but I tried it over kubespan before that too.
This is something I'm trying to achieve as well. Using Tailscale for node-node communication and then disable cilium tunnel. Do you mind giving me some help how did you manage to configure this?
@mentos1386 basically I just configured to route the nodes .spec.podCIDRs
and its host ips through tailscale with advertise-routes
. This is running on all nodes using a talos extension.
You can contact me on Slack if you need more information, but I hope I can make this extension public in the future.
We have a strong interest for this. We run in hybrid cloud mode with control plane nodes on Azure and physical servers elsewhere, so we leverage KubeSpan for this. Cilium native routing reduces the vxlan+wireguard encapsulation overhead between the pods across the nodes.
After digging a bit into the issue, I figured that the simplest way to solve this problem was to have a daemonset that extracts the main cilium pod IP address on cilium_host
and adds a secondary IP address with the mask size for the whole node, for instance /24
.
This in combination with advertiseKubernetesNetworks
enabled ensures that each node pod CIDR is added to each kubespan AllowedIPs
peer.
Here's the daemonset that solves the problem:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cilium-host-node-cidr
namespace: kube-system
spec:
selector:
matchLabels:
app: cilium-host-node-cidr
template:
metadata:
name: cilium-host-node-cidr
labels:
app: cilium-host-node-cidr
spec:
hostNetwork: true
tolerations:
- key: "node-role.kubernetes.io/master"
operator: Exists
- key: "node-role.kubernetes.io/control-plane"
operator: Exists
containers:
- name: cilium-host-node-cidr
image: alpine
imagePullPolicy: Always
command:
- /bin/sh
- -c
- |
apk update
apk add iproute2
handle_error() {
echo "$1"
sleep "$SLEEP_TIME"
}
echo "Watching cilium_host IP addresses..."
while :; do
# Extract all IPv4 addresses from cilium_host
ip_addresses=$(ip -4 addr show dev cilium_host |grep inet | awk '{print $2}')
# Check if any of the IP addresses match the NODE_CIDR_MASK_SIZE
echo "$ip_addresses" | grep -q "/${NODE_CIDR_MASK_SIZE}" || {
# Extract the /32 IP address if NODE_CIDR_MASK_SIZE was not found
pod_ip=$(echo "$ip_addresses" | grep "/32" | cut -d/ -f1)
if [ -z "$pod_ip" ]; then
handle_error "Couldn't extract cilium pod IP address from cilium_host interface"
continue
fi
# Add secondary IP address with the proper NODE_CIDR_MASK_SIZE
echo "cilium_host IP is $pod_ip"
ip addr add "${pod_ip}/${NODE_CIDR_MASK_SIZE}" dev cilium_host
echo "Added new cilium_host IP address with mask /${NODE_CIDR_MASK_SIZE}"
ip addr show dev cilium_host
}
sleep "$SLEEP_TIME"
done
env:
# The node cidr mask size (IPv4) to allocate pod IPs
- name: NODE_CIDR_MASK_SIZE
value: "24"
- name: SLEEP_TIME
value: "30"
securityContext:
capabilities:
add: ["NET_ADMIN"]
Ideally, I think that Talos Linux should probably do something natively for this somehow.
In the mean time, feel free to deploy the daemonset above with your Cilium native routing setup.
There is a native routing feature in Cilium which is based on L3 routing rather than using tunnel encapsulation, because we already have L3 connectivity with KubeSpan (thanks to WireGuard), having another layer of encapsulation is somewhat meaningless, that it contributes more to the MTU reduction. As such I think we should have something like
advertiseKubernetesNetworks: true
but not actually routing the packets automatically, and we could still let Cilium or other CNI handle it.I've done this in the past manually, which is done by manipulating wgconf with "AllowedIPs =, " and
Table = Off
in every node.I think KubeSpan did this already, but all we need to do is make sure the eBPF and nftables rules in KubeSpan can correctly handle the traffic.