projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.04k stars 1.35k forks source link

Routing issue when requesting a specific IP address #9492

Closed heliloop closed 2 days ago

heliloop commented 4 days ago

Hi, I belive I am facing a similar issue to https://github.com/projectcalico/calico/issues/4287

I have two containers running in 2 pods: A and B, each pod is deployed to a different node N1 and N2

Pod A has annotation:

cni.projectcalico.org/ipAddrs: '["10.40.10.10","fd40::10:10"]'

Pod B has annotation:

cni.projectcalico.org/ipAddrs: '["10.40.10.11","fd40::10:11"]'

There is no routing (e.g. pods can ping each other) between pods when this annotiation is present

Expected Behavior

From pod A I should be able to ping pod B For pod B I should be able to ping pod A

Current Behavior

Possible solution

Choose IP far away from each other, e.g. in this example it'd be 10.40.4.4 and 10.40.8.8 so they get registered with BGP server and netmask of /26 like this

B>* 10.40.4.0/26 [20/0] via 192.168.4.13, igc0.4, weight 1, 00:26:07
B>* 10.40.8.0/26 [20/0] via 192.168.4.12, igc0.4, weight 1, 00:26:07

Similar issues with IPv6 and similar solution (will keep it out of the scope of this issue)

Configuration

This is 2 node bare metal installation N1 is 192.168.4.12 and N2 192.168.4.13

apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  calicoNetwork:
    ipPools:
      - name: default-ipv4-pool
        cidr: 10.40.0.0/16
        encapsulation: None
        natOutgoing: Disabled
        nodeSelector: all()
      - name: default-ipv6-pool
        cidr: fd40::/64
        encapsulation: None
        natOutgoing: Disabled
        nodeSelector: all()
---
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: default-ipv4
spec:
  peerIP: 192.168.4.20
  asNumber: 64512
---
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: default-ipv6
spec:
  peerIP: fdaa:4::20
  asNumber: 64512
---
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
  name: default
spec:
  logSeverityScreen: Debug
  nodeToNodeMeshEnabled: false
  asNumber: 64514
  serviceClusterIPs:
    - cidr: fd41::/112
    - cidr: 10.41.0.0/16
  serviceLoadBalancerIPs:
    - cidr: 192.168.4.64/26 # 192.168.4.64 - 192.168.4.127
    - cidr: fdaa:4::80/121  # fdaa:4::80 - fdaa:4::ff

BGP routes as visible from router

Notice /26

B>* 10.40.10.0/26 [20/0] via 192.168.4.13, igc0.4, weight 1, 00:03:39
B>* 10.40.10.10/32 [20/0] via 192.168.4.12, igc0.4, weight 1, 00:03:39
...

Routes on Node1

0.0.0.0         192.168.4.20    0.0.0.0         UG    0      0        0 ens3
10.40.10.10     0.0.0.0         255.255.255.255 UH    0      0        0 cali5b0bf06291f
....
192.168.4.0     0.0.0.0         255.255.255.0   U     0      0        0 ens3

Routes on Node2

0.0.0.0         192.168.4.20    0.0.0.0         UG    0      0        0 ens3
10.40.10.0      0.0.0.0         255.255.255.192 U     0      0        0 *
10.40.10.11     0.0.0.0         255.255.255.255 UH    0      0        0 calibf61fb7ebcc
...
192.168.4.0     0.0.0.0         255.255.255.0   U     0      0        0 ens3
mazdakn commented 2 days ago

@heliloop the issue is about in-cluster routing. In your cluster, no encapsulation is enabled, which is about this line:

encapsulation: None

Also internal BGP mesh is also disabled:

nodeToNodeMeshEnabled: false

You need to enabled either of this options to let routes be programmed. Please refer to Calico docs for more information: https://docs.tigera.io/calico/latest/networking/configuring/

heliloop commented 2 days ago

Thanks just to wrap up, this setting solved it for me. Thanks a lot.

# BGPConfiguration
nodeToNodeMeshEnabled: true

# BGPPeer
keepOriginalNextHop: true