projectcalico / bird

Calico's fork of the BIRD protocol stack
91 stars 86 forks source link

Unix: Fix signals synchronization #101

Closed pluzun closed 2 years ago

pluzun commented 2 years ago

Description

Event loop (io_loop) must ensure that there is no new signal received between signal processing and flag reset to 0. This PR aims to fix the current behavior of the signal management to ensure that there is no missed signals.

We experienced some routing issue on our Kubernetes cluster since we frequently deploy a lot of new pods (cronjobs). After checking bird's logs, we noticed that sometimes 2 SIGHUP were sent with few ms delay between them and only the first one was processed. This led to route inconsistency on nodes.

Logs before this fix (2 SIGHUP, last one is never processed and routes are never removed):

2022-03-28 17:31:52.785 [INFO][74] confd/resource.go 278: Target config /etc/calico/confd/config/bird_aggr.cfg has been updated due to change in key: /calico/ipam/v2/host/worker-028/ipv4/block
bird: Reconfiguration requested by SIGHUP
bird: Reconfiguring
bird: device1: Reconfigured
bird: direct1: Reconfigured
bird: Mesh_10_1_100_31: Reconfigured
[...] # a lot of "Mesh_XXXX: Reconfigured" here
2022-03-28 17:31:52.791 [INFO][74] confd/resource.go 278: Target config /etc/calico/confd/config/bird_aggr.cfg has been updated due to change in key: /calico/ipam/v2/host/worker-028/ipv4/block
[...] # a lot of "Mesh_XXXX: Reconfigured" here
bird: Mesh_10_1_101_214: Reconfigured
bird: Reconfigured

Logs after this fix (2 SIGHUP, both are processed):

2022-03-29 14:57:18.427 [INFO][74] confd/resource.go 278: Target config /etc/calico/confd/config/bird_aggr.cfg has been updated due to change in key: /calico/ipam/v2/host/worker-028/ipv4/block
bird: Reconfiguration requested by SIGHUP
bird: Reconfiguring
bird: device1: Reconfigured
bird: direct1: Reconfigured
bird: Mesh_10_1_100_31: Reconfigured
[...] # a lot of "Mesh_XXXX: Reconfigured" here
2022-03-29 14:57:18.434 [INFO][74] confd/resource.go 278: Target config /etc/calico/confd/config/bird_aggr.cfg has been updated due to change in key: /calico/ipam/v2/host/worker-028/ipv4/block
[...] # a lot of "Mesh_XXXX: Reconfigured" here
bird: Mesh_10_1_101_214: Reconfigured
bird: Reconfigured
bird: Reconfiguration requested by SIGHUP
bird: Reconfiguring
bird: device1: Reconfigured
bird: direct1: Reconfigured
bird: Mesh_10_1_100_31: Reconfigured
[...] # a lot of "Mesh_XXXX: Reconfigured" here
bird: Mesh_10_1_101_214: Reconfigured
bird: Reconfigured
Fix race condition in BIRD that could potentially cause missed config updates
CLAassistant commented 2 years ago

CLA assistant check
All committers have signed the CLA.

caseydavenport commented 2 years ago

/sem-approve