Open rkojedzinszky opened 2 years ago
ping
@fujita - Any chance you could take a look at this one?
@rkojedzinszky has done quite a bit of work tracking down this seeming inconsistency with GoBGP's handling of GracefulRestart when GoBGP is configured for GracefulRestart, but the peer is not (see https://github.com/cloudnativelabs/kube-router/issues/1389 for the full history and context). This is causing them quite a bit of a headache as anytime kube-router is restarted they lose all routes until they also restart their peer.
This is a regression(old version gobgp worked)?
This is a regression(old version gobgp worked)?
I dont know, as I was using kube-router only. I suspect a change in kube-router triggered this bug. Earlier, a bug in kube-router masked or changed behavior in a way that it seemed that everything is ok, but it turns out that the applied change in kube-router is correct (https://github.com/cloudnativelabs/kube-router/pull/1327).
Elaborating just a bit in this thread (obviously all work can be found in the PR that @rkojedzinszky linked)...
Previously kube-router used to announce graceful restart for the gobgpapi.Family_AFI_IP
and gobgpapi.Family_AFI_IP6
AFI families whether we used them or not. As a matter of fact, at the point in time that @rkojedzinszky linked to, the BGP implementation in kube-router was only capable of working with IPv4 or IPv6 and could not work with both at the same time.
Some BGP peer implementations were fine ignoring an AFI setup for a family that wasn't used, but other ones it caused breakages unless users configured an AFI group that they weren't using and usually weren't capable of using.
However, this also seems to have worked around a bug (?) in GoBGP that @rkojedzinszky documented in the description of this issue up above.
Hmm, seems that I can't reproduce this.
btw, in the sample code, why LocalRestarting
is set to true? It should not.
@fujita I have created my sample code based on kube-router behavior. I will need some time to pick this up again, I'll report as soon as I can.
Hmm, seems that I can't reproduce this.
btw, in the sample code, why
LocalRestarting
is set to true? It should not.
@fujita
I could reproduce it again. So use the attached code, and make sure to use the following frr configartion:
# sh running-config
Building configuration...
Current configuration:
!
frr version 8.4.2
frr defaults traditional
hostname frr
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 10
bgp graceful-restart-disable
neighbor 192.168.111.1 remote-as 20
!
address-family ipv4 unicast
neighbor 192.168.111.1 route-map bgp-default in
neighbor 192.168.111.1 route-map bgp-default out
exit-address-family
exit
!
route-map bgp-default permit 10
exit
!
end
Pay attention to the bgp graceful-restart-disable
statement. Ensure that the sample code is started after frr. Then I observed that after 3 mins there are no advertisements:
frr# sh ip bgp summary
IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.111.55, local AS number 10 vrf-id 0
BGP table version 8
RIB entries 0, using 0 bytes of memory
Peers 1, using 724 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
192.168.111.1 4 20 42 41 0 0 0 00:03:31 0 0 N/A
Total number of neighbors 1
Howewer, if I make a clear ip bgp on frr, then the sessions gets established again, and then prefixes gets advertised immediately:
frr# clear ip bgp *
frr# sh ip bgp summary
IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.111.55, local AS number 10 vrf-id 0
BGP table version 10
RIB entries 0, using 0 bytes of memory
Peers 1, using 724 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
192.168.111.1 4 20 46 49 0 0 0 00:00:02 Idle 0 N/A
Total number of neighbors 1
frr# sh ip bgp summary
IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.111.55, local AS number 10 vrf-id 0
BGP table version 10
RIB entries 0, using 0 bytes of memory
Peers 1, using 724 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
192.168.111.1 4 20 46 50 0 0 0 00:00:05 Active 0 N/A
Total number of neighbors 1
frr# sh ip bgp summary
IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.111.55, local AS number 10 vrf-id 0
BGP table version 10
RIB entries 0, using 0 bytes of memory
Peers 1, using 724 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
192.168.111.1 4 20 46 50 0 0 0 00:00:12 Active 0 N/A
Total number of neighbors 1
frr# sh ip bgp summary
IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.111.55, local AS number 10 vrf-id 0
BGP table version 11
RIB entries 1, using 192 bytes of memory
Peers 1, using 724 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
192.168.111.1 4 20 49 52 0 0 0 00:00:01 1 0 N/A
Total number of neighbors 1
This behavior is consistent with cisco ios implemetation too, we had never had issues with frr-cisco setups with gr enabled on one end and disabled on the other.
I can reproduce the issue. The following change fixed the issue.
diff --git a/main.go b/main.go
index 87bb682..eea63c7 100644
--- a/main.go
+++ b/main.go
@@ -91,7 +91,7 @@ func main() {
Enabled: true,
RestartTime: 120,
DeferralTime: 10,
- LocalRestarting: true,
+ // LocalRestarting: true,
},
AfiSafis: []*gobgpapi.AfiSafi{
@aauren
I created my own example code based on kube-router, where LocalRestarting is also set to true. I think there is a good reason why it is done that way, howewer, I am not really familiar with such depths of bgp/gobgp fsm. Can you please comment on it?
@rkojedzinszky This was unfortunately way before my time with kube-router. It was introduced with the graceful-restart feature here: https://github.com/cloudnativelabs/kube-router/pull/220/files#diff-d0dae64b5424393b01d606bfcabfa3e8fd82d2c466eaeb28187a4da833105273R504 and no description was given as to why it was added.
@fujita Thanks for tracking this down! We really appreciate your time and effort! I found a bit of code documentation about this feature here: https://github.com/osrg/gobgp/blob/master/internal/pkg/config/bgp_configs.go#L4246-L4252
However, I'm still not certain that I understand the implications of setting or not setting it and without understanding it better I feel hesitant to change a setting that has been in the graceful restart implementation since day 1 with kube-router.
Is there any chance that you would be willing to clarify what the option does and what the impact to it being set to true or false is?
@fujita Also please note that in the sample code if you enable IPv6 address family too, and follow the same sequence of starting the speakers, then as commented in the code, a timer will fire soon, and advertisements will begin to be sent. This is with LocalRestarting: true
. So, it seems to be an inconsistent behavior.
LocalRestarting
is configuration for a restarting speaker:
https://github.com/osrg/gobgp/blob/master/docs/sources/graceful-restart.md#restarting-speaker
should not be set in your case.
I can't explain the behavior with IPv6 enabled. Needs to investigate.
@fujita Thanks again for continuing to pursue this with us.
You mentioned that you didn't think that LocalRestarting
should be set in our case. And in the case of @rkojedzinszky's example code, I think I understand what you mean because it is starting for the first time, and yet it is still setting the LocalRestarting
value. However, I'm not sure if it would actually be good to disable it for the kube-router case.
First off, to ensure that I understand what this flag does correctly from your documentation, is it correct to say that it lets it's peers know that it is recovering from a restart by setting the restart bit. And that this is the same procedure that is outlined by RFC4724 here:
To re-establish the session with its peer, the Restarting Speaker MUST set the "Restart State" bit in the Graceful Restart Capability of the OPEN message. Unless allowed via configuration, the "Forwarding State" bit for an address family in the capability can be set only if the forwarding state has indeed been preserved for that address family during the restart.
If so, then I'm not sure how to proceed with this setting when it comes to kube-router as a whole. You see, kube-router is most frequently run as a pod that can start and stop at any time. Forwarding state is preserved by the fact that kube-router writes out the routing table to the host's network. However, other than that, kube-router does not preserve state across reboots, so it isn't possible for it to reliably know whether it is starting for the first time, or if it is recovering from a previous run where it had previously established graceful-restart enabled BGP sessions. However, in a typical system, the latter is more common than the former.
According to the documentation that you linked it says (-r
being the same as setting LocalRestarting
via the Go API):
Without -r option, the peers which are under helper procedure will immediately withdraw all routes which were advertised from gobgpd.
So, my understanding of all of this put together then, is:
true
or false
false
, then after restarting, and upon re-establishing our BGP session with our peers, they will immediately withdraw all of our previous routes that they were previously holding in their FIBI would assume that sometime later after route selection is complete, these routes would then come back, but there would be a period of service outage from the routes being withdrawn correct?
@aauren Thanks, great explanation. We are using this feature of kube-router for inter-node disruption-free upgrades, howewer, for our external connections, we dont want to use gr. This way we expect that if a node goes down for any reason (even just for an upgrade), traffic instantly gets rerouted to another node.
@fujita can we move forward somehow?
@fujita ping
I'm interested in this one too. Currently we lose our BGP routes every time kubespray restarts the kube-router containers
@rkojedzinszky I understand that you want graceful restarts for BGP sessions between nodes and want it disabled for BGP sessions to your upstream router. In theory, IF you did not need graceful restarts for inter-node BGP sessions, would removing the --bgp-graceful-restart resolve this issue that you're seeing on sessions to the upstream routers?
@rkojedzinszky I understand that you want graceful restarts for BGP sessions between nodes and want it disabled for BGP sessions to your upstream router. In theory, IF you did not need graceful restarts for inter-node BGP sessions, would removing the --bgp-graceful-restart resolve this issue that you're seeing on sessions to the upstream routers?
Yes, indeed. If GR is disabled in gobgp, everything works fine, prefixes get advertised as expected to upstream routers, in any scenario.
I think this PR fixed the issue which is released in v3.26.0.
In Cilium, we could reproduce with the version using GoBGP v3.23.0, but couldn't with the latest main using v3.26.0. So, once someone else could confirm, I think we can close this issue.
Seems like the way I tried was bad. Even with v3.26.0, the issue was still there.
@YutaroHayakawa I will take a look in the next days, thanks!
Scratching my comment above. Seems like the way I tried was bad. Even with v3.26.0, the issue was still there.
History: https://github.com/cloudnativelabs/kube-router/issues/1389
I have a gobgp instance configured for graceful restart, and an external peer without graceful restart capability. The symptom is that when gobgp is started, the bgp session gets established, howewer, no prefixes get advertised to the neighbor. If then I restart the peer, prefixes get advertised.
I wrote a sample source which demonstrates this.
gobgp.zip
Note that graceful restart is configured for ipv4 only. Then, when bgp is established, I get the following log entries:
Then, all address families receive EOR (https://github.com/osrg/gobgp/blob/master/pkg/server/server.go#L1699). But when proceeding, it turns out that p.isGracefulRestartEnabled() returns false, thus not sending out updates.
Kube-router until v1.5.1 configured graceful restart for both ipv4 and ipv6. That made the deferral-timeout trigger, and then a different code path had sent out advertisements, howewer only after the timeout.
Once the peer is restarted, then the whole FSM knows about the peer is not configured for graceful restart, and it gets advertisements as soon the session is up.
For the peer a simple frr configuration is enough: