projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.04k stars 1.35k forks source link

How to mark Node NotReady if Calico / CNI network is Down. #9499

Closed mailtoamitsingh closed 1 day ago

mailtoamitsingh commented 3 days ago

We are running K8s Cluster and having seperate Control (K8s) Network and Calico Network for interpod communication.

We are having RR in Cluster and BGP peer are created with other nodes in cluster and another BGP uplink towards TOR.

Behaviour which we see if Calico Network goes off due to any reason (NIC down) on particular node, that node is not getting into NotReady state and Endpoints of service are not getting updated accordingly.

Any help is appreciated..

caseydavenport commented 1 day ago

There is an open enhancement request for this here: https://github.com/projectcalico/calico/issues/5233

Right now, Calico won't mark the node network as unavailable in potentially transient states like this.

I think it's fairly reasonable to have Calico code that can handle this in certain very well understood sitautions - e.g., BGP network down. However, we have to be very certain this is a real issue before we do, else we risk introducing instability into the cluster.

There's some more detail / discussion in that linked issue.

Going to close this in favor of the one I linked above, if you'd like to continue the discussion there.