octavia: LB losing VRRP ports and subsequently failing

garloff commented 2 years ago

Creating a loadbalancer with a VIP on a certain subnet creates 3 ports into this subnet, which is to be expected. https://docs.openstack.org/octavia/latest/contributor/specs/version0.8/active_passive_loadbalancer.html However, when working with cluster-API deployments, I have seen the two VRRP ports disappear after a few minutes.

Subsequently, the communication breaks down, the health-monitor puts the member into operating_status ERROR (which is the correct consequence) and new members get the provisioning_status ERROR (which looks a correct consequence as well). The LB is then no longer usable, no matter what I do.

garloff commented 2 years ago

I have observed this on gx-scs, pco (both wallaby, aka R2, ovn) and wavestack (yoga, R3-pre, ovs). I did not observe these disappearing vrrp ports half a year ago on gx-scs (which could be due to gx-scs being different or capo treating the infra slightly differently to not trigger this issue). I don't see anything like this on Cleura.

garloff commented 2 years ago

Please discard this bug report! api_monitor.sh is running in the same project. It cleans up left overs from octavia and this can hit the wrong ports. Will fix it there. Sorry for the noise!

osism / issues

octavia: LB losing VRRP ports and subsequently failing #304