node with wrong ipv6 route table after ipv6 vip failover

ltgentoo commented 10 months ago

k8s with dual stack enabled,use haproxy and keepalived for ha，after failover，the node with vip previously get wrong ipv6 route table

Expected Behavior

vip address: 2001::201 on 2001::21 we have a test cluster with 3 nodes 2001::21 2001::22 2001::23 calico mode: ipip crosssubnet before failover,the ipv6 route table is:

2001::23: 2000:100:100:100:19ca:52ab:2617:eac0/122 2001::22 UG 1024 1 0 ens33 2000:100:100:100:891c:ddc:b181:4840/122 2001::21 UG 1024 1 0 ens33
2001::22: 2000:100:100:100:891c:ddc:b181:4840/122 2001::21 UG 1024 1 0 ens33 2000:100:100:100:97a2:de77:c193:200/122 2001::23 UG 1024 2 0 ens33
2001::21: 2000:100:100:100:19ca:52ab:2617:eac0/122 2001::22 UG 1024 1 0 ens33 2000:100:100:100:97a2:de77:c193:200/122 2001::23 UG 1024 1 0 ens33 before vip failover,everything works fine after vip failovers,the ipv6 route should not changed

Current Behavior

after failover vip [a](address: 2001::201 on 2001::22，the ipv6 route tables is:

2001::22: 2000:100:100:100:891c:ddc:b181:4840/122 2001::21 UG 1024 2 0 ens33 2000:100:100:100:97a2:de77:c193:200/122 2001::23 UG 1024 3 0 ens33
2001::23: 2000:100:100:100:19ca:52ab:2617:eac0/122 2001::22 UG 1024 2 0 ens33 2000:100:100:100:891c:ddc:b181:4840/122 2001::21 UG 1024 2 0 ens33
2001::21: 2000:100:100:100:19ca:52ab:2617:eac0/122 2001::2839:3654:bcd8:88c3 UG 1024 2 0 ens33 2000:100:100:100:97a2:de77:c193:200/122 2001::2839:3654:bcd8:88c3 UG 1024 1 0 ens33

ipv6 route on node 2001::21 changed , 2001::2839:3654:bcd8:88c3 is our defautl ipv6 gateway, i don't know why of course， with the wrong ipv6 route ,can't reach pod on other node from 2001::21

Possible Solution

we change the config of calico-node env IP6_AUTODETECTION_METHOD : kubernetes-internal-ip and interface=ens33 both config;the result is same
check calicoctl get nodes -oyaml result ,and bgp.ipv6Address is correct
if we delete pod calico-node on 2001::21,the ipv6 route tables will be resume correct
Steps to Reproduce (for bugs)
deploy a k8s cluster with dual stack enabled
keepalive + haproxy for ha
stop haproxy for vip failover manually
check the ip v6 route

Context

Your Environment

Calico version 3.23.1 3.26 3.27
Orchestrator version (e.g. kubernetes, mesos, rkt): k8s 1.23.6
Operating System and version: centos7
Link to your project (optional):

mazdakn commented 10 months ago

@ltgentoo have you followed our docs for this setup? We don't use HAProxy/Keepalived for high availability. In this case of failover Calico is not aware of the change. this is controlled by keepalived, and the routes are added by it.

ltgentoo commented 10 months ago

thanks for your reply, we use HAProxy/Keepalived for apiserver ha,i know calico don‘t need HAProxy/Keepalived,maybe there are some conflicts with them.we try to solve the problem

I would like to add some information with this config: IP6_AUTODETECTION_METHOD : kubernetes-internal-ip,the ipv6 route will be ok after ipv6 vip failover in vxlan mode, calico_backend: vxlan but when calico_backend: bird, bgp mode,the route will be wrong after vip failover。

My confusion is that： it looks like the problem is: when vip was deleted from the interface,the calico route lost,then the defautl ipv6 gateway be added even i don't know,the problem is keepavlied,felix,or bird? Do you have any suggestions?

ltgentoo commented 9 months ago

I would like to add more information We found that this is not related to keepalive

when we stop keepalive/harproxy calico config:

IP6_AUTODETECTION_METHOD : kubernetes-internal-ip
calico_backend: bird
CALICO_IPV6POOL_VXLAN: Never
- name: CALICO_IPV4POOL_IPIP value: Never
  - name: CALICO_IPV6POOL_IPIP value: Never
  - name: CALICO_IPV4POOL_VXLAN value: Never
  - name: CALICO_IPV6POOL_VXLAN value: Never we manually add a ip v6 address for ens33,then delete this address route for pod cidr befor ipv6 address delete:

ip addr add 2001::201/64 dev ens33
ip addr del 2001::201/64 dev ens33

after we delete this address,the route table for pod cidr is:

it seems that when delete ip address ,It will lead to incorrect route of IPv6

im-jinxinwang commented 9 months ago

Hi @mazdakn I have encountered the same problem, and can conduct a failure test on a Kubernetes cluster with dual stack enabled.

im-jinxinwang commented 9 months ago

Hi @mazdakn I found from the log that Felix updated the routing gateway address multiple times. This is not correct, because 2001::2839:3654:bcd8:88c3 is the default gateway address for the host.

node1.txt

mazdakn commented 9 months ago

@fasaxc can you please comment on this? It seems we ignore non local routes here: https://github.com/projectcalico/calico/blob/126ddced8d2f070f34482bb2076f65a0f8d4d596/felix/ifacemonitor/update_filter.go#L116 but routes for virtual addresses are not local. WDYT?

im-jinxinwang commented 9 months ago

@mazdakn The new IPV6 address added theoretically does not belong to the local route, so why does it cause changes in calico routing?

im-jinxinwang commented 9 months ago

@mazdakn The normal logic is that Calico will change the IPv6 address of the node to the pod IPV6 network segment gateway address, but the phenomenon here is abnormal.

fasaxc commented 9 months ago

Please can you add the output from these commands:

ip addr show
ip -6 route show

I'm not sure that route shows all the information that we need. If you don't have ip route installed, you can exec it int he calico-node pod.

Note that IPIP is not an option for IPv6. The options are to

Form a mesh over a permissive L2 fabric (i.e. all nodes in one L2 broadcast domain)
Peer with your routers
Use VXLANv6 (added in Calico v3.23).

The first two options use BIRD to distribute routes. At a guess, BIRD is picking up the extra IP address and concluding that it is not in the same subnet as the other nodes so it routes via the default gateway instead. I'm not sure why BIRD would be preferring that IP, hopefully the above output will shed some light.

With VXLAN, I think we explicitly use the autodetected IP so that might work here.

ltgentoo commented 9 months ago

@fasaxc ok, let's just talk about bgp，ipip not included before vip failover: