projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
5.89k stars 1.31k forks source link

In kubernetes calico removes existing ipPool routes when pool disabled with calicoctl #3672

Open prezhdarov opened 4 years ago

prezhdarov commented 4 years ago

When existing pool is disabled all internode routes are removed with only routes to pod addresses on the node and other enabled ipPool routes.

Expected Behavior

When ipPool is disabled using calicoctl that stops calico from assigning new pods addresses from said pool, but all existing assigned addresses are routed and reachable.

Current Behavior

The moment the ipPool is disabled all internode routes are removed immediately thus denying all internode communication.

Possible Solution

Normal and expected behaviour is as designed, leave the routes on until ipPool is deleted.

Steps to Reproduce (for bugs)

Not sure this is reproducible, but what I did was:

  1. Install calico with migration manifest, adding "can-reach" method for ip autodetect.
  2. Migrate using migration job manifest all pods and nodes from flannel 0.12
  3. Update calico manifests using standard calico install - the only difference is ipip is enabled and liveness and readiness probes now check on bird daemon (diff showed only these as different)
  4. Added ipip pool to move off vxlan
  5. All hell broke lose.
  6. Removed said ipip pool, all went back to normal

Context

Tried to update existing flannel enabled cluster to same configuration of freshly installed calico ipip cluster.

Your Environment

Small on premise kubernetes cluster of 12 nodes

This is what a node in the cluster looks with single pool configured:

Single pool

This is what happens when another pool is added:

newpool added

Now to disable the first pool (used for all the pods):

current pool disabled

Even if currently used pool is re-enabled, routes are still missing:

current pool re-enabled

And if new pool is disabled:

new pool disabled

routes re-appear when all other pools are removed:

new pool deleted
caseydavenport commented 4 years ago

I think this is probably a bug specifically with VXLAN IP pools, which are implemented in Felix rather than in confd / BIRD (that's why disabling the IPIP pool doesn't remove the routes).

caseydavenport commented 4 years ago

@prezhdarov FWIW, you should just be able to modify your existing IP pool to use IPIP instead of VXLAN without needing to disable it. Just modify ipipMode: Always and vxlanMode: Never.