projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
5.92k stars 1.32k forks source link

Failure in Pod connectivity when an additional IP address on the primary interface in the same subnet is added and removed #8739

Open svallala opened 5 months ago

svallala commented 5 months ago

Felix is incorrectly removing the directly connected route when it detects that an IP address is deleted even if there are additional addresses in the same subnet on the interface.

This is causing critical failures in the field.

Expected Behavior

Pod connectivity should not break

I have a 3 node IPv6 Kubernetes cluster with a VIP managed via Keepalived. When things are stable the routing table looks intact, pod subnets for the other nodes have the next hop correctly set as the Node IP Address.

VIP - fd74:ca9b:3a09:868c:10:9:121:136 Primary Node IP - fd74:ca9b:3a09:868c:10:9:61:181

[root@hypervvm-61-181 ~]# ip addr show dev br0
38: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 00:15:5d:14:24:2a brd ff:ff:ff:ff:ff:ff
    inet6 **fd74:ca9b:3a09:868c:10:9:121:136/64** scope global deprecated nodad
       valid_lft forever preferred_lft 0sec
    inet6 **fd74:ca9b:3a09:868c:10:9:61:181/64** scope global
       valid_lft forever preferred_lft forever

Routing Table on the Host -------------------------

[root@hypervvm-61-181 ~]# ip -6 route | grep fd74:ca9b:3a09:868c:
fd74:ca9b:3a09:868c:10:9:124:4d00/122 via **fd74:ca9b:3a09:868c:10:9:61:182** dev br0 proto bird metric 1024 pref medium
fd74:ca9b:3a09:868c:10:9:124:4d40/122 via **fd74:ca9b:3a09:868c:10:9:61:183** dev br0 proto bird metric 1024 pref medium
..
fd74:ca9b:3a09:868c::/64 dev br0 proto kernel metric 256 pref medium
default via fd74:ca9b:3a09:868c::1 dev br0 metric 1 pref medium

Bird routing table in the Calico Pod -----------------------------------

[root@hypervvm-61-181 /]# birdcl6
BIRD v0.3.3+birdv1.6.8 ready.
bird> show route
....
fd74:ca9b:3a09:868c:10:9:124:4d40/122 via **fd74:ca9b:3a09:868c:10:9:61:183** on br0 [Mesh_fd74_ca9b_3a09_868c_10_9_61_183 21:09:17] * (100/0) [i]
fd74:ca9b:3a09:868c:10:9:124:4d00/122 via **fd74:ca9b:3a09:868c:10:9:61:182** on br0 [Mesh_fd74_ca9b_3a09_868c_10_9_61_182 21:09:16] * (100/0) [i]
**fd74:ca9b:3a09:868c::/64 dev br0 [direct1 21:09:15] * (240)**

If the VIP now moves to a different node, then the directly connected route is missing from the Calico Pod Bird routing table, because of this the Pod subnet routes are configrued with the next hop as the default gateway. This results in failure in Pod connectivity

VIP is now moved to a different node

[root@hypervvm-61-181 ~]# ip addr show dev br0
38: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 00:15:5d:14:24:2a brd ff:ff:ff:ff:ff:ff
    inet6 fd74:ca9b:3a09:868c:10:9:61:181/64 scope global
       valid_lft forever preferred_lft forever

Host Routing Table - Note that the next hop for the pod subnets is configured as the default gateway -----------------------------------------------------------------------------------------------------

[root@hypervvm-61-181 ~]# ip -6 route | grep fd74:ca9b:3a09:868c:
fd74:ca9b:3a09:868c:10:9:124:4d00/122 via **fd74:ca9b:3a09:868c::1** dev br0 proto bird metric 1024 pref medium
fd74:ca9b:3a09:868c:10:9:124:4d40/122 via **fd74:ca9b:3a09:868c::1** dev br0 proto bird metric 1024 pref medium
....
fd74:ca9b:3a09:868c::/64 dev br0 proto kernel metric 256 pref medium
default via fd74:ca9b:3a09:868c::1 dev br0 metric 1 pref medium

Bird Routing table in the Calico Pod - The directly connected route for subnet fd74:ca9b:3a09:868c::/64 is missing -------------------------------------------------------------------------------------------------------------------

bird> show route
....
fd74:ca9b:3a09:868c:10:9:124:4d40/122 via **fd74:ca9b:3a09:868c::1** on br0 [Mesh_fd74_ca9b_3a09_868c_10_9_61_183 21:09:18 from fd74:ca9b:3a09:868c:10:9:61:183] * (100/?) [i]
fd74:ca9b:3a09:868c:10:9:124:4d00/122 via **fd74:ca9b:3a09:868c::1** on br0 [Mesh_fd74_ca9b_3a09_868c_10_9_61_182 21:09:17 from fd74:ca9b:3a09:868c:10:9:61:182] * (100/?) [i]

I am able to reproduce the issue even without VIP movement. I just have to add an additional IP address in the same subnet to the primary interface and then remove it. Looks like when Felix detects that an IP address is removed, it incorrectly is removing the directly connected route entry even if there are additional IP addresses on the interface in the same subnet

Your Environment

caseydavenport commented 5 months ago

This does seem strange, especially considering the interface still has an IP address within that subnet even after the VIP is removed. Seems likely to be related to BIRD's / BGP next hop calculation rather than Felix though.

Perhaps worth looking into whether or not the remote nodes have changed the next hop address on the advertised BGP routes as well, in case it's a peer issue rather than an issue with the local route resolution.

svallala commented 5 months ago

This does seem strange, especially considering the interface still has an IP address within that subnet even after the VIP is removed. Seems likely to be related to BIRD's / BGP next hop calculation rather than Felix though.

Perhaps worth looking into whether or not the remote nodes have changed the next hop address on the advertised BGP routes as well, in case it's a peer issue rather than an issue with the local route resolution.

@caseydavenport yes you are right, seems like an issue in BIRD. The peers are not impacted its only the local route resolution. For now, as a workaround adding a static route for the subnet.

nelljerram commented 5 months ago

Our BIRD fork is based on an upstream BIRD version (v1.6.8) that is now a little old, and it's possible that this has been fixed in upstream BIRD since v1.6.8. If an interested party would like to investigate that and identify the relevant change (if there is one), we could certainly look at cherry-picking that to our fork.

abasitt commented 4 months ago

@nelljerram thank you for pointing about the possible bug. It's indeed a bug in v1.6.8. below is the result from v1.6.8.

bash status.sh 

Initial BIRD Routes
────────────────────────────────────────
BIRD 1.6.8 ready.
::/0               via fd00:1::1 on eth0 [kernel1 08:58:31] * (10)
fd00:1::/64        dev eth0 [direct1 08:58:31] * (240)
fd00:10::/64       dev eth0 [static1 08:58:31] * (200)
fd00:11::/64       via fe80::42:c0ff:fea8:2002 on eth0 [bgp1 08:58:39 from fd00:1::3] * (100/0) [AS65002i]

Routes After IP Addition
────────────────────────────────────────
BIRD 1.6.8 ready.
::/0               via fd00:1::1 on eth0 [kernel1 08:58:31] * (10)
fd00:1::/64        dev eth0 [direct1 08:58:31] * (240)
fd00:10::/64       dev eth0 [static1 08:58:31] * (200)
fd00:11::/64       via fe80::42:c0ff:fea8:2002 on eth0 [bgp1 08:58:39 from fd00:1::3] * (100/0) [AS65002i]

Routes After IP Deletion
────────────────────────────────────────
BIRD 1.6.8 ready.
::/0               via fd00:1::1 on eth0 [kernel1 08:58:30] * (10)
fd00:10::/64       dev eth0 [static1 08:58:30] * (200)
fd00:11::/64       via fd00:1::1 on eth0 [bgp1 08:58:38 from fd00:1::3] * (100/?) [AS65002i]

below is the result from bird2

bash status.sh 

Initial BIRD Routes
────────────────────────────────────────
BIRD 2.14 ready.
Table master6:
::/0                 unicast [kernel1 09:02:48.030] * (10)
        via fd00:1::1 on eth0
fd00:11::/64         unicast [static1 09:02:48.028] * (200)
        dev eth0
fd00:12::/64         unicast [p1 09:02:49.736] * (100) [AS65002i]
        via fd00:1::3 on eth0

Routes After IP Addition
────────────────────────────────────────
BIRD 2.14 ready.
Table master6:
::/0                 unicast [kernel1 09:02:48.030] * (10)
        via fd00:1::1 on eth0
fd00:11::/64         unicast [static1 09:02:48.028] * (200)
        dev eth0
fd00:12::/64         unicast [p1 09:02:49.736] * (100) [AS65002i]
        via fd00:1::3 on eth0

Routes After IP Deletion
────────────────────────────────────────
BIRD 2.14 ready.
Table master6:
::/0                 unicast [kernel1 09:02:48.030] * (10)
        via fd00:1::1 on eth0
fd00:11::/64         unicast [static1 09:02:48.028] * (200)
        dev eth0
fd00:12::/64         unicast [p1 09:02:49.736] * (100) [AS65002i]
        via fd00:1::3 on eth0

The setup can be recreate here

fasaxc commented 1 month ago

Looks like v1.6.8 was the last BIRD 1.6 release that there will be

abasitt commented 1 month ago

Will/Are there any plans to move to bird2?

nelljerram commented 1 month ago

@abasitt I would say it's an aspiration for us to move to a newer BIRD, but I'm afraid we don't yet have a concrete plan to do so.

Did you manage to identify the fix commit (or the relevant changed code) in BIRD 2?