metallb / metallb

A network load-balancer implementation for Kubernetes using standard routing protocols
https://metallb.universe.tf
Apache License 2.0
7k stars 905 forks source link

MetalLB L2 does not work when using a linux bridge #2445

Closed ThorbenJ closed 2 months ago

ThorbenJ commented 2 months ago

MetalLB Version

0.14.5

Deployment method

Manifests

Main CNI

bridge

Kubernetes Version

v1.30.2+k3s1

Cluster Distribution

k3s

Describe the bug

I am using the CNI bridge plugin (https://www.cni.dev/plugins/current/main/bridge/) connected to a linux (6.8.11) vlan aware bridge. In normal operation the MetalLB manged IP service can be reached from other k8s node host OSes, but from outside the cluster, not even from the lan router. However when I put the bridge on the node currently assigned for L2 advertisements into promisc mode it starts to work! I found this out by accident when I attached tcpdump to the bridge, and varified it with ip link set bridge promisc on afterwards. In interestingly I see the L2 advertisements on the router, but no packets come back unless the bridge is in promisc mode.

I don't think its a good idea to put all bridges on all nodes into promisc mode. I tried turning various things with ethtool off, no luck.

To Reproduce

  1. Setup bridge interface
  2. Install K3S cluster (disable treafik/servicelb, and flannel) with CNI bridge
  3. Install MetalLB
  4. Edit the default ns kubernetes service from type clusterIP to LoadBalanced and annotate with static IP
  5. Confirm a L2 adver' has been assigned to a node

From other node curl -k https://172.17.43.1/api -> works From router (openwrt) and beyond curl -k https://172.17.43.1/api -> Only works when bridge is in promisc

The service:

Name:                     kubernetes
Namespace:                default
Labels:                   component=apiserver
                          provider=kubernetes
Annotations:              metallb.universe.tf/ip-allocated-from-pool: front-pool
                          metallb.universe.tf/loadBalancerIPs: 172.17.43.1
Selector:                 <none>
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       240.42.0.1
IPs:                      240.42.0.1
LoadBalancer Ingress:     172.17.43.1
Port:                     https  443/TCP
TargetPort:               6443/TCP
NodePort:                 https  31665/TCP
Endpoints:                172.17.42.201:6443,172.17.42.202:6443,172.17.42.203:6443
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason        Age                From             Message
  ----    ------        ----               ----             -------
  Normal  nodeAssigned  18m (x2 over 78m)  metallb-speaker  announcing from node "worker04" with protocol "layer2"
  Normal  nodeAssigned  17m                metallb-speaker  announcing from node "worker05" with protocol "layer2"
  Normal  nodeAssigned  50s (x3 over 17m)  metallb-speaker  announcing from node "worker04" with protocol "layer2"

MetalLB config:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: front-pool
  namespace: metallb-system
spec:
  addresses:
  - 172.17.43.0/24
---
kind: L2Advertisement
metadata:
  name: l2advertise-all
  namespace: metallb-system

/e/n/interfaces (snippet):

auto enP3p49s0
iface enP3p49s0 inet manual
    bridge-vids 1 43

auto frontbr
iface frontbr inet static
    bridge-ports enP3p49s0
    bridge-vlan-aware yes
    bridge-vids 1 40-50
    bridge-fd 2
    bridge-stp on
    address 172.17.42.201/23
    gateway 172.17.42.1

auto frontbr.43
iface frontbr.43 inet static
    address 240.40.201.1/16

/etc/cni/net.d/10-front-bridge-cni.conf

{
    "cniVersion": "1.0.0",
    "name": "frontbr",
    "type": "bridge",
    "bridge": "frontbr",
    "vlan": 43,
    "isDefaultGateway": false,
    "isGateway": false,
    "ipMasq": false,
    "ipam": {
        "type": "host-local",
        "ranges": [
            [{
                "subnet": "240.40.0.0/16",
                "rangeStart": "240.40.201.10",
                "reageEnd": "240.40.201.254",
                "gateway": "240.40.201.1"
            }]
        ],
        "routes": [
            { "dst": "0.0.0.0/0" }
        ]
    }
}

Native vlan is the host/node network a /23, the first /24 for hosts then second /24 for MetalLB vlan 43 connects pods across all nodes.

The privat/privat class E pod/svc network may not leave the host: -A POSTROUTING -s 240.0.0.0/4 ! -d 240.0.0.0/4 -j MASQUERADE

Expected Behavior

Always to work, even when bridge not in promisc mode

Additional Context

Tried search docs for "bridge" looked at issues that mentioned "bridge" but found nothing that appeared to apply/help. Tried explicitly setting the interface one to the bridge, another time to the physical nic.

(Honesty: I only tried the most recently tagged MetalLB, as noted above, not from master)

I've read and agree with the following

I've read and agree with the following

ThorbenJ commented 2 months ago

metallb_report.tgz

fedepaol commented 2 months ago

Seems a dupe of https://github.com/metallb/metallb/issues/253 , well described in https://github.com/metallb/metallb/issues/253#issuecomment-1098300839

Closing as not fixable in MetalLB itself, see also https://github.com/metallb/metallb/issues/535

ThorbenJ commented 2 months ago

EDIT: While this works, enabling hairpin appears to leads to network storms, so this is not a complete solution; unfortunately.

Been looking at this and here is my solution: On the physical (metal) nic where you expect to get connections from enable hairpin mode (allows packet/frames to leave the port they entered) and enable proxy_arp (allow the port to answer for other's mac):

ip link set dev enP3p49s0 type bridge_slave  proxy_arp on hairpin on

Before:

curl -k https://172.17.43.1/
curl: (7) Failed to connect to 172.17.43.1 port 443 after 3032 ms: Couldn't connect to server

After:

curl -k https://172.17.43.1/
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

Adding this comment for others to find, maybe the docs could be amended?