oxidecomputer / maghemite

A routing stack written in Rust.
Mozilla Public License 2.0
36 stars 2 forks source link

Tunnel routes flapping on BGP session shutdown #308

Closed rcgoodfellow closed 3 months ago

rcgoodfellow commented 3 months ago

When a BGP peer shuts down, and that peer is providing the only path to a given prefix, that prefix should be withdrawn from tunnel routes advertised on the rack underlay network.

Testing that I'm currently doing is showing that this is not working as expected. Instead of withdraws, we see a withdraw and then and advertise around a second later.

20:20:29.238Z INFO slog-rs: withdraw tunnel: {
        TunnelOrigin {
            overlay_prefix: V4(
                Ipv4Net {
                    addr: 0.0.0.0,
                    width: 0,
                },
            ),
            boundary_addr: fd21:ab92:6d64:93df::1,
            vni: 99,
            metric: 18446744073709551615,
        },
    }
20:20:30.436Z INFO slog-rs: advertise tunnel: {
        TunnelOrigin {
            overlay_prefix: V4(
                Ipv4Net {
                    addr: 0.0.0.0,
                    width: 0,
                },
            ),
            boundary_addr: fd21:ab92:6d64:93df::1,
            vni: 99,
            metric: 18446744073709551615,
        },
    }

Running mgadm bgp status imported shows there are no imported routes. So we should not be re-announcing here.

Interestingly, we see this for the static RIB.

$ mgadm static get-v4-routes
{
    "0.0.0.0/0": [],
}

which is problematic .... I'm guessing what is happening here is that the BGP daemon is removing the nexthop from the RIB with the session shuts down, but leaves the prefix with no nexthops and that is confusing mg-lower in terms of synchronizing tunnel routes to DDM in the sense that we have the prefix in the RIB, but there are no nexthops.