projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.01k stars 1.34k forks source link

Support BFD in the confd generated bird configuration #4607

Open akosiaris opened 3 years ago

akosiaris commented 3 years ago

Hi!

At the Wikimedia Foundation, the non profit that operates Wikimedia, we 've been happy users of calico circa 2017. It's powering our kubernetes clusters quite nicely, so nicely that we never had up to now ask of anything. Well, here's a first request (which we are aiming to contribute code for)

Our setup isn't particularly complex, in fact the network part of it is described in detail at https://wikitech.wikimedia.org/wiki/Network_design (yes, we are transparent about how everything is setup) but the TL;DR is that we have 2 Core routers and we BGP peer our kubernetes nodes directly with them and only with them. So we don't do ToR BGP peering as our access switches - asw for short - don't have L3 functionality, nor do we do full or even partial mesh with the other nodes (kinda suboptimal, we know, it's a project for the future)

We 've recently started experimenting with announcing kubernetes service IPs and it works pretty nicely (including ECMP), so many thanks for that. We did however meet the following issue with simulating some failure modes.

When simulating the sudden total failure of a node (due e.g. to PSU issues, total motherboard failure etc) we noticed that it will take up to 4 minutes for BGP to converge and for the Core routers to pull the failed route from their FIB, effectively blackholing (part of - depending on whether ECMP is configured or not) traffic for the advertised addresses. This is kind of expected (to our understanding) as both sides have graceful restart configured (which is of course very much wanted) and the core routers use the GR timer for deciding when to retract the route from the FIB.

After discussing this internally we reached the conclusion that BFD would be the better solution to our problem (we already have other servers BGP peering using bird and having BFD configured mind you). I am spelling out our plan below in the rubric. We see that since projectcalico/bird#68 BFD should be supported.

Expected Behavior

When a kubernetes node using calico and BGP peering with an upstream router suddenly fails totally, BGP promptly converges and traffic is blackholed for a minimal amount of time.

Current Behavior

When a kubernetes node using calico and BGP peering with an upstream router suddenly fails totally, BGP traffic is blackholed for up to 4 minutes in our case, although this is dependent on the BGP configuration.

Possible Solution

Add something like the following (untested) in bird.cfg.template and bird6.cfg.template

{{- $bfd := getenv "ENABLE_BFD" "true" }}
{{- if $bfd == "true" }}
protocol bfd {
    interface -"cali*", -"kube-ipvs*", "*";
        interval 300 ms;
        multiplier 3;
    };
    multihop {
        interval 300 ms;
        multiplier 3;
    };
}
{{- end }}

and similarly if guard a bfd yes; statement in the bgp_template

I 'd be more than happy to submit a PR to the confd repo if this is a good way to go about this, but also happy to discuss alternatives.

Steps to Reproduce (for bugs)

  1. Simulate a node failure (pull the power, shut the network switch port, ip link set down on the node etc)
  2. monitor the prefix on the other side (e.g. show route | refresh on a juniper router will do) and wait for it to stop being used. The exact time is dependent on BGP configuration, but it does require for the BGP GR timer to expire.

Context

This issue gave us pause on adopting the kubernetes service IPs and external IPs functionality that calico is offering. We would like to use it to automate more our kubernetes service creation process which has some manual steps today.

Your Environment

caseydavenport commented 3 years ago

@akosiaris sorry for the delay on this! I think your request is 100% reasonable.

Like I said on the PR, we're moving away from env var config for driving this stuff. It's not really BGP configuration, but the BGPConfiguration or BGPPeer API seems like the closest fit for this sort of config. Alternatively maybe this goes on the Node API?

Also CC @neiljerram who I believe has looked at this before.

I'm happy to work with you to get that API change sorted out. It's a bit more work than the env var PR that you originally submitted. Once we agree the right spec, it would be a change to the API structs, and then confd to plumb the toggle through into the BIRD configuration.

nelljerram commented 3 years ago

Thanks @akosiaris and @caseydavenport for CCing me.

FWIW, in Calico Enterprise we unconditionally added this at the top level:

protocol bfd {
}

and then a BGPPeer field to control adding bfd on; into the config for each peering:

    // Specifies whether and how to detect loss of connectivity on the peerings generated by
    // this BGPPeer resource.  Default value "None" means nothing beyond BGP's own (slow) hold
    // timer.  "BFDIfDirectlyConnected" means to use BFD when the peer is directly connected.
    FailureDetectionMode FailureDetectionMode

But that misses two things that this PR proposes:

It would be nice if any support added by this PR is a compatible extension of what we already have in Enterprise.

nelljerram commented 3 years ago

is a compatible extension

To be clear, I mean in terms of the data model. (I'm sure we will be able to do any fix-ups that are needed to the related coding.)

RefluxMeds commented 4 months ago

@caseydavenport Any plans to have this in the open source version as well?