projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.01k stars 1.34k forks source link

node with wrong ipv6 route table after ipv6 vip failover #8381

Open ltgentoo opened 10 months ago

ltgentoo commented 10 months ago

k8s with dual stack enabled,use haproxy and keepalived for ha,after failover,the node with vip previously get wrong ipv6 route table

Expected Behavior

vip address: 2001::201 on 2001::21 we have a test cluster with 3 nodes 2001::21 2001::22 2001::23 calico mode: ipip crosssubnet before failover,the ipv6 route table is:

Current Behavior

after failover vip [a](address: 2001::201 on 2001::22,the ipv6 route tables is:

ipv6 route on node 2001::21 changed , 2001::2839:3654:bcd8:88c3 is our defautl ipv6 gateway, i don't know why of course, with the wrong ipv6 route ,can't reach pod on other node from 2001::21

Possible Solution

  1. we change the config of calico-node env IP6_AUTODETECTION_METHOD : kubernetes-internal-ip and interface=ens33 both config;the result is same
  2. check calicoctl get nodes -oyaml result ,and bgp.ipv6Address is correct
  3. if we delete pod calico-node on 2001::21,the ipv6 route tables will be resume correct

    Steps to Reproduce (for bugs)

  4. deploy a k8s cluster with dual stack enabled
  5. keepalive + haproxy for ha
  6. stop haproxy for vip failover manually
  7. check the ip v6 route

Context

Your Environment

mazdakn commented 10 months ago

@ltgentoo have you followed our docs for this setup? We don't use HAProxy/Keepalived for high availability. In this case of failover Calico is not aware of the change. this is controlled by keepalived, and the routes are added by it.

ltgentoo commented 10 months ago

thanks for your reply, we use HAProxy/Keepalived for apiserver ha,i know calico don‘t need HAProxy/Keepalived,maybe there are some conflicts with them.we try to solve the problem

I would like to add some information with this config: IP6_AUTODETECTION_METHOD : kubernetes-internal-ip,the ipv6 route will be ok after ipv6 vip failover in vxlan mode, calico_backend: vxlan but when calico_backend: bird, bgp mode,the route will be wrong after vip failover。

My confusion is that: it looks like the problem is: when vip was deleted from the interface,the calico route lost,then the defautl ipv6 gateway be added even i don't know,the problem is keepavlied,felix,or bird? Do you have any suggestions?

ltgentoo commented 9 months ago

I would like to add more information We found that this is not related to keepalive

when we stop keepalive/harproxy calico config:

ip addr add 2001::201/64 dev ens33
ip addr del 2001::201/64 dev ens33

after we delete this address,the route table for pod cidr is:

image

it seems that when delete ip address ,It will lead to incorrect route of IPv6

im-jinxinwang commented 9 months ago

Hi @mazdakn I have encountered the same problem, and can conduct a failure test on a Kubernetes cluster with dual stack enabled.

im-jinxinwang commented 9 months ago

Hi @mazdakn I found from the log that Felix updated the routing gateway address multiple times. This is not correct, because 2001::2839:3654:bcd8:88c3 is the default gateway address for the host. image

node1.txt

mazdakn commented 9 months ago

@fasaxc can you please comment on this? It seems we ignore non local routes here: https://github.com/projectcalico/calico/blob/126ddced8d2f070f34482bb2076f65a0f8d4d596/felix/ifacemonitor/update_filter.go#L116 but routes for virtual addresses are not local. WDYT?

im-jinxinwang commented 9 months ago

@mazdakn The new IPV6 address added theoretically does not belong to the local route, so why does it cause changes in calico routing?

im-jinxinwang commented 9 months ago

@mazdakn The normal logic is that Calico will change the IPv6 address of the node to the pod IPV6 network segment gateway address, but the phenomenon here is abnormal.

fasaxc commented 9 months ago

Please can you add the output from these commands:

ip addr show
ip -6 route show

I'm not sure that route shows all the information that we need. If you don't have ip route installed, you can exec it int he calico-node pod.

Note that IPIP is not an option for IPv6. The options are to

The first two options use BIRD to distribute routes. At a guess, BIRD is picking up the extra IP address and concluding that it is not in the same subnet as the other nodes so it routes via the default gateway instead. I'm not sure why BIRD would be preferring that IP, hopefully the above output will shed some light.

With VXLAN, I think we explicitly use the autodetected IP so that might work here.

ltgentoo commented 9 months ago

@fasaxc ok, let's just talk about bgp,ipip not included before vip failover:

image

after vip failover:

image
abasitt commented 5 months ago

possibly related to https://github.com/projectcalico/calico/issues/8739

lwr20 commented 4 months ago

With VXLAN, I think we explicitly use the autodetected IP so that might work here.

@ltgentoo did you get a chance to try VXLANv6?

coutinhop commented 1 month ago

@ltgentoo any updates on trying VXLANv6?