telekom / das-schiff-network-operator

Configure netlink interfaces, simple eBPF filters and FRR using Kubernetes resources.
Apache License 2.0
29 stars 2 forks source link

[Bug] On first reconcile FRR (zebra) does not always load the VRF VNI map correctly #143

Closed Cellebyte closed 1 week ago

Cellebyte commented 1 week ago

Description

When a node is booted it starts with a baseline configuration to bring up the necessary interfaces to form a cluster. After the intial configuration is loaded nwop starts reconciling additional configuration from the kube-apiserver.

When it then reconciles this new configuration by running systemctl reload frr the daemon does not always load the configuration as desired. And instead of L3 VNIs the daemon just load L2 VNIs.

It rather loads only parts into the configuration. E.g.

# frr-reload runtime.txt snippet
vrf vr.blue
 vni 922
exit
 vrf vr.red
 vni 980
exit
 router bgp 4200065169 vrf Vrf_mgmt
  address-family ipv4 unicast
   neighbor mgmt_def activate
  exit
 exit
 frr version 8.0.1
 vrf vr.san
exit
 vrf vr.san
 vni 990
exit
 vrf vr.green
exit
 vrf vr.green
 vni 920
exit
 vrf vr.teal
exit
 vrf vr.teal
 vni 924
exit
 vrf vr.purple
exit
 vrf vr.purple
 vni 925
exit
 vrf vr.yellow
exit
 vrf vr.yellow
 vni 926
exit
 vrf vr.black
exit
 vrf vr.black
 vni 929
exit
 vrf vr.white
exit
 vrf vr.white
 vni 981
exit
 vrf vr.rose
exit
 vrf vr.rose
 vni 9801
exit
# vtysh -c "show evpn vni"
VNI        Type VxLAN IF              # MACs   # ARPs   # Remote VTEPs  Tenant VRF
920        L2   vx.green              0        0        0               vr.green
929        L2   vx.black              0        0        0               vr.black
980        L2   vx.red                0        0        0               vr.red
925        L2   vx.purple             0        0        0               vr.purple
922        L2   vx.blue               0        0        0               vr.blue
9801       L2   vx.rose               0        0        0               vr.rose
981        L2   vx.white              0        0        0               vr.white
924        L2   vx.teal               0        0        0               vr.teal
990        L2   vx.san                0        0        0               vr.san
926        L2   vx.yellow             0        0        0               vr.yellow

Hints

It seems we already tried to mitigate this behaviour in these code lines.

https://github.com/telekom/das-schiff-network-operator/blob/28a37d84370870e9f30044b37b2ad932cc28a490/pkg/reconciler/layer3.go#L68-L96

Hotfix

Run this command and the vnis are correctly configured in frr

systemctl reload frr
p-strusiewiczsurmacki-mobica commented 1 week ago

I've did some testing and I've created draft PR on this: #144

It seems that the only thing that differentiates the L2 VNI from L3 VNI for FRR is whether or not it is defined in the config file. For example, if we define interfaces in the config file:

vrf vr.test10030
 vni 10030
exit-vrf

But the interface is not yet created it will be present in FRR as L3 interface with no VxLAN interface.

VNI        Type VxLAN IF              # MACs   # ARPs   # Remote VTEPs  Tenant VRF
10032      L3   None                  0        0        n/a             Unknown

After the interface VxLAN interface is created, FRR will automatically discover it and add as L3 without transitioning from L2.

VNI        Type VxLAN IF              # MACs   # ARPs   # Remote VTEPs  Tenant VRF
10032      L3   vx.test10030                  0        0        n/a             vx.test10030

In current implementation we do as follow:

  1. Create L3 interfaces.
  2. Update FRR config and reload
  3. Set interfaces up.

So, in the PR I've changed order of operations:

  1. Update FRR config and reload (which creates L3 interfaces with no VxLAN as a placeholder).
  2. Create L3 Interfaces.
  3. Set interfaces up

With this order I did not see the transition to L3-VNI messages. However I am not sure if this approach to FRR configuration is correct, so it would be best if someone with more knowledge on FRR could take a look at this.

Additionally, I've fixed the second issue you've reported - during the configuration of the interfaces operator will now try to set up any interface that was already existing (was created in previous iteration of reconciliation loop). For example, if L3 named test was created but never got up due to some other errors, the operator will try to bring it up in next iteration.