projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
5.94k stars 1.32k forks source link

why calico(act as the role of RR) does not propagate the exteral route(from Router ) to their RR client #4539

Closed Shawn71 closed 3 years ago

Shawn71 commented 3 years ago

Expected Behavior

calico is expected to propagate the external BGP routes (from Router via BGP peer) to their RR client. the RR client represent other nodes within the cluster

Current Behavior

calico does not propagate the external BGP routes (from the router via BGP peer) to their RR client

Possible Solution

This issue has stopped me for a long time, still not able to figure out the workaround

Steps to Reproduce (for bugs)

  1. deploy calico with version 3.17.1
  2. disable the full BGP peer between calico nodes
  3. select master as the router reflector and label the master "route-reflector=true", in my case,there are two masters (master1 192.168.1.4 and master2 192.168.1.5)
  4. apply below BGP peer
[root@master1 calico]# cat bgp_rr.yaml 
kind: BGPPeer
apiVersion: projectcalico.org/v3
metadata:
  name: peer-with-route-reflectors
spec:
  nodeSelector: all()
  peerSelector: route-reflector == 'true'
  #peerSelector: 'has(routeReflector)'
  #peerSelector: all()

until here, everything is just working fine, please see the below output : (192.168.1.xx is the IP address of k8s node, 10.6.0.4 is the IP address of cisco CSR ) [root@master1 calico]# calicoctl node status The calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+--------------------------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |              INFO              |
+--------------+---------------+-------+----------+--------------------------------+
| 10.6.0.4     | node specific | up    | 01:37:31 | Established                    |
| 192.168.1.11 | node specific | up    | 01:36:12 | Established                    |
| 192.168.1.5  | node specific | up    | 01:36:11 | Established                    |
| 192.168.1.6  | node specific | up    | 01:36:12 | Established                    |
| 192.168.1.7  | node specific | up    | 01:36:28 | Established                    |
| 192.168.1.8  | node specific | up    | 01:36:12 | Established                    |
| 192.168.1.84 | node specific | start | 01:36:10 | Active Socket: Connection      |
|              |               |       |          | refused                        |
+--------------+---------------+-------+----------+--------------------------------+

IPv6 BGP status No IPv6 peers found.

we can see the route is being propagated between the k8s node and just working fine

[root@master1 calico]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG    0      0        0 eth0
10.244.1.0      10.244.1.0      255.255.255.192 UG    0      0        0 vxlan.calico
10.244.2.0      10.244.2.0      255.255.255.192 UG    0      0        0 vxlan.calico
10.244.3.0      10.244.3.0      255.255.255.192 UG    0      0        0 vxlan.calico
10.244.4.0      10.244.4.0      255.255.255.192 UG    0      0        0 vxlan.calico
10.244.12.0     10.244.12.0     255.255.255.192 UG    0      0        0 vxlan.calico
10.244.13.0     192.168.1.1     255.255.255.192 UG    0      0        0 eth0
88.1.1.1        192.168.1.1     255.255.255.255 UGH   0      0        0 eth0
88.1.1.2        192.168.1.1     255.255.255.255 UGH   0      0        0 eth0
88.1.1.3        192.168.1.1     255.255.255.255 UGH   0      0        0 eth0
168.63.129.16   192.168.1.1     255.255.255.255 UGH   0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     1002   0        0 eth0
169.254.169.254 192.168.1.1     255.255.255.255 UGH   0      0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
192.0.2.0       0.0.0.0         255.255.255.0   U     0      0        0 br-8d06e53a8d62
192.168.1.0     0.0.0.0         255.255.255.192 U     0      0        0 eth0
[root@master1 calico]# 
  1. then apply below yaml file to establish bgp peer between masters and cisco csr
[root@master1 calico]# cat bgppeerpernodetociscocsr2.yaml 
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: bgppeertociscocsr2
spec:
  peerIP: 
      '10.6.0.4'
  asNumber: 65002
  nodeSelector: peertociscocsr == 'true'

we can see the BGP peer actually is just working great and cisco CSR advertise three routes to the k8s cluster which is 88.1.1.1/32,88.1.1.2/32,88.1.1.3/32 please see the below :

cicsocsr2#sh ip bg summary 
BGP router identifier 88.1.1.3, local AS number 65002
BGP table version is 14, main routing table version 14
13 network entries using 3224 bytes of memory
31 path entries using 4216 bytes of memory
2/2 BGP path/bestpath attribute entries using 560 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 8024 total bytes of memory
BGP activity 13/0 prefixes, 31/0 paths, scan interval 60 secs
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
192.168.1.4     4        65001     191     186       14    0    0 02:45:27        9
192.168.1.5     4        65001     192     187       14    0    0 02:45:27        9
cicsocsr2#sh ip bg nei 192.168.1.4 advertised-routes 
BGP table version is 14, local router ID is 88.1.1.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
              t secondary path, L long-lived-stale,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>   10.96.0.0/16     192.168.1.4                            0 65001 i
 *>   10.244.1.0/26    192.168.1.6                            1 65001 i
 *>   10.244.2.0/26    192.168.1.4                            0 65001 i
 *>   10.244.3.0/26    192.168.1.4                            0 65001 i
 *>   10.244.4.0/26    192.168.1.4                            0 65001 i
 *>   10.244.11.0/26   192.168.1.4                            0 65001 i
 *>   10.244.12.0/26   192.168.1.4                            0 65001 i
 *>   10.244.13.0/26   0.0.0.0                  0         32768 i
 *>   88.1.1.1/32      0.0.0.0                  0         32768 i
 *>   88.1.1.2/32      0.0.0.0                  0         32768 i
 *>   88.1.1.3/32      0.0.0.0                  0         32768 i
 *>   192.168.20.99/32 192.168.1.4                            0 65001 i
 *>   192.168.100.0    192.168.1.4                            0 65001 i

Total number of prefixes 13 

and actually, the k8s master has already put three routes to their route tables :

[root@master1 calico]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG    0      0        0 eth0
10.244.1.0      10.244.1.0      255.255.255.192 UG    0      0        0 vxlan.calico
10.244.2.0      10.244.2.0      255.255.255.192 UG    0      0        0 vxlan.calico
10.244.3.0      10.244.3.0      255.255.255.192 UG    0      0        0 vxlan.calico
10.244.4.0      10.244.4.0      255.255.255.192 UG    0      0        0 vxlan.calico
10.244.12.0     10.244.12.0     255.255.255.192 UG    0      0        0 vxlan.calico
10.244.13.0     192.168.1.1     255.255.255.192 UG    0      0        0 eth0
88.1.1.1        192.168.1.1     255.255.255.255 UGH   0      0        0 eth0
88.1.1.2        192.168.1.1     255.255.255.255 UGH   0      0        0 eth0
88.1.1.3        192.168.1.1     255.255.255.255 UGH   0      0        0 eth0
168.63.129.16   192.168.1.1     255.255.255.255 UGH   0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     1002   0        0 eth0
169.254.169.254 192.168.1.1     255.255.255.255 UGH   0      0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
192.0.2.0       0.0.0.0         255.255.255.0   U     0      0        0 br-8d06e53a8d62
192.168.1.0     0.0.0.0         255.255.255.192 U     0      0        0 eth0
[root@master1 calico]# 

but it just doesn't propagate the routes to their RR client,for example, node2

[root@node2 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG    0      0        0 eth0
10.244.1.0      10.244.1.0      255.255.255.192 UG    0      0        0 vxlan.calico
10.244.2.30     0.0.0.0         255.255.255.255 UH    0      0        0 calidfa4d3a824d
10.244.3.0      10.244.3.0      255.255.255.192 UG    0      0        0 vxlan.calico
10.244.4.0      10.244.4.0      255.255.255.192 UG    0      0        0 vxlan.calico
10.244.11.0     10.244.11.0     255.255.255.192 UG    0      0        0 vxlan.calico
10.244.12.0     10.244.12.0     255.255.255.192 UG    0      0        0 vxlan.calico
168.63.129.16   192.168.1.1     255.255.255.255 UGH   0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     1002   0        0 eth0
169.254.169.254 192.168.1.1     255.255.255.255 UGH   0      0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
192.168.1.0     0.0.0.0         255.255.255.192 U     0      0        0 eth0

we can not see any route related to 88.1.1.x./32 I have done the traffic capture from master1 and master2 and did the analysis, Actually, the master did not send any BGP update packet to their client regarding the route related to 88.1.1.x/32. if someone who can tell me the reason or how to fix that, that would be really appreciated.

Your Environment

caseydavenport commented 3 years ago

@ShawnBian I believe the issue here is that Calico isn't expecting to re-advertise routes learned from your infrastructure. Namely, Calico has an export filter that says "only advertise routes within Calico IP pools". We do this because we don't want to enable arbitrary injection of routes out of the box.

You might be able to make this work by simply adding a new IP pool to your cluster that includes the CIDR of the addresses you would like Calico to re-advertise, but by setting spec.disabled: true to prevent Calico from allocating pod IPs from that range.

Shawn71 commented 3 years ago

thank you for updates , I will close this issue .

megakid commented 1 year ago

Is this still the case today? I’m having similar issues whereby any pods I force to run on my route reflector nodes can access my (non cluster) wider network but pods on my peered (to rr) nodes (regular workers) cannot.

I haven’t tried the workaround mentioned above yet.