Closed fflo closed 1 year ago
Hi,
Thanks for the PR. Few comments...
Hi Nathan,
So updating on the bonding, I just did a brief test with a bond and adding two underlying ethernet interfaces. Using link-monitoring=mii, when all of the underlying interfaces disabled, it is no longer marked R
running. This seems like it should work without explicitly disabling the bonds, what am I missing? Are you using an alternate link-monitoring? (none?) With the underlying interfaces disabled, the LACP shouldn't function at all to the switch so I can't figure out the scenario where this goes bad.
I'm not against your patch, I'm just trying to keep ha-mikrotik as minimal as possible to reduce the bug/maintenance surface.
Thanks
Hi Nathan,
if you do not disable the bond immediately on startup the bonding interface becomes active for a short period of time during reboot of the backup CCR.
I guess that's the case because the bonding interface is active and available until ALL underlying interface members are disabled by the script.
I have configured dynamic routing using OSPF on a Vlan and it did break the routing exchange before patching the bond and HA_VRRP bridge because the active CCR receives packets originating from itself and peer receives duplicate packets originating from both CCRs for about 1-2 seconds on each backup CCR reboot. Disabling the bond and HA_VRRP bridge on startup did resolve the issue for me.
I think it makes sense to enable the bond only in case of becoming an active master after the underlaying ethernet interfaces have been enabled already.
To further reduce the failover time we should consider to keep the underlying interfaces enabled (because it takes multiple seconds to enable multiple 10 Gbps SPF+s) and disable/enable the virtual bonding interface only in case of a backup->master, master->... event.
Maybe you have an idea of how to archive this goal in an elegant way.
-FF
This does sound disruptive to your environment. Out of curiosity, did you try leaving the bonding disable/enable entirely within the startup? It seems like the /interface ethernet disable [find]
is taking too long vs. how long it takes for your interfaces to come up, which creates your problem.
Can you test something like the following:
/interface bonding disable [find]
/interface ethernet disable [find]
/interface bonding enable [find]
And remove all of the other bonding enable/disable? It basically gives us a small window for messing with the bond during bootstrapping and by the time we re-enable them, the underlying interfaces SHOULD be disabled.
I'm really trying to avoid tampering with other components if we can. One problem with this is that we are disabling and enabling all bonds, even ones that the user may have intended to disable. I know we do it with the ethernet interfaces but I'd like to reduce the places that we do this, otherwise we run into "gotchas" that the user wasn't expecting to happen.
I don't really see a nice way keep the interfaces enabled for a faster failover, we need to cleanly isolate the backup and leaving them enabled allows for too many overlay configurations that we would then need to deal with (EoIP, etc).
Appreciate all of the feedback, hopefully we can narrow in on a nice solution that works for everyone.
One other comment on "keeping interfaces up". I considered this years and and played around with the idea but never came up with something that felt solid. I THINK one way we might be able to pull it off is with global firewall rules that filter out all ingress/egress on the non-$haInterface interfaces. In theory, this would allow layer 1 to come up and drop all other communication. It also would solve the enable/disabling problem where we end up enabling interfaces that the user intended to disable. Any thoughts on this design?
Also, because of the cloned MACs, that may introduce some problems with this design depending on the upstream switches.
It would be nice if Mikrotik had some sort of soft disable, that keeps layer 1 up but prevents all communication on a per-interface basis.
One other comment on "keeping interfaces up". I considered this years and and played around with the idea but never came up with something that felt solid. I THINK one way we might be able to pull it off is with global firewall rules that filter out all ingress/egress on the non-$haInterface interfaces. In theory, this would allow layer 1 to come up and drop all other communication. It also would solve the enable/disabling problem where we end up enabling interfaces that the user intended to disable. Any thoughts on this design?
Also, because of the cloned MACs, that may introduce some problems with this design depending on the upstream switches.
It would be nice if Mikrotik had some sort of soft disable, that keeps layer 1 up but prevents all communication on a per-interface basis.
I briefly tested the firewall idea again just now and while it does work, there are MAC issues with PortSec: %ETH-4-HOST_FLAPPING
on my Arista MLAG rigs. So not sure this is going to be a good approach.
I don't think it's a good idea trying to block outbound packets on layer-3 (firewall) instead of working with layer-2.
As a replacement for enabling/disabling the physical interfaces the cleanest or most performant solution would be to work with logical interfaces only which are bound to one or more physical interfaces.
For example bond or bridge with special name syntax: HA_...
I don't think it's a good idea trying to block outbound packets on layer-3 (firewall) instead of working with layer-2.
As a replacement for enabling/disabling the physical interfaces the cleanest or most performant solution would be to work with logical interfaces only which are bound to one or more physical interfaces.
For example bond or bridge with special name syntax: HA_...
This would force everyone to configure their devices in a special way. Many people use multiple ethernet interfaces with ha-mikrotik and the idea is to keep the configuration completely "normal" so nobody has to think about HA. I wouldn't want to require that everyone uses a logical interface to overlay the physical for it to work correctly.
I agree, how about moving this idea into a separate branch?
It's a shame that Mikrotik has no official support for this topic: RouterOS should at least natively support syncing the connection tracking table and IPsec states between two devices.
We could branch it. How do we deal with the duplicate MAC problem though? We could reset to the original mac but then we need to deal with gratuitous arp.
I know the convergence time has room for improvement but it generally works well for me and others, it takes a few seconds but it is a rare event here. I’m interested in figuring out a clean fix for your bonding problem without having bonding enable/disable everywhere. Any chance you can test the idea from the earlier comment? I can’t easily test it right now in my environment with bonding setup this way.
(1) Why do you expect issues with duplicate mac? There is no need to modify the original mac address of the hardware interfaces working with bond and/or bridge logical interfaces only. Only the logical interfaces need to share the same mac address per pair as these mac addresses are used for communication. Or did I miss something?
(2) If you like I can test a modified code not disabling the bond interface(s). The CCR cluster is not ready for production, yet. Right now I am working to further reduce the failover downtime of the dynamic routing convergence time using optimized BGP with BFD instead of using modified OSPF (and OSPFv3) settings.
Are you not using auto-mac on your bridges/bonds? I was pretty sure RouterOS selects MAC based on one of the underlying interfaces. I know it can be explicitly overridden but that is extra admin work/gotcha.
Yes, if you could test the simplified bond enable/disable and let me know, that would be great.
What is your convergence time right now with the pair?
Retried simplified bond enable/disable configure, but it does not work on CCR1072-1G-8S+. Every Standby reboot causes traffic interruption for about 5 seconds.
The convergence time depends on the interface configuration because each enable/disable operation on an sfp+ interface requires ~2-3 seconds.
Using a bonding of the first four spf+ with min-links 1 and using dynamic routing BGP4 with BFD I was able to reduce the convergence time from ~30s to about 10s using with some more optimizations: https://github.com/svlsResearch/ha-mikrotik/compare/master...fflo:master
Does every standby reboot cause a 5s interruption currently or one of your changes resolve this?
Using my changes standby reboots and HARoleSwitch it's running smoothly.
But you asked me to re-test the simplified bond enable/disable on ha_startup only and using this simplified setup causes ~5s traffic interruption on each Standby reboot: In detail once every time a configuration change has been loaded or at least once a day at 5 am.
If you look at my latest changes I have furthermore added the following optimizations and features:
Furthermore, we should look for a more elegant solution to disable dynamic routing configurations in "on-backup" mode, to avoid the logs of the Standby being flooded by senseless error messages and to reduce convergence time in "on-master" event.
Gotcha. Do you understand what went wrong with the disable/enable pattern? If bond is disabled as early as it is in your working patch and then the ethernet is disabled and then bond re-enabled, I don’t see how the interruption happens?
For the dynamic routing, do you think this can be sufficiently handled by the currently supported on_master and on_standby callbacks? We can also add additional callbacks for alternate places that may make your setup functional and easier to keep in sync with my master.
setup up bonding enforces a change of the original mac-address of the physical spf+ interfaces; it seems to me that this change internally re-occurs on each reboot causing the physical devices to re-flap (on/off) for 1-2 seconds.
Re-enabling the bonding after "ha_startup step 0.3: /interface ethernet disable [find disabled=no]" still causes trouble on CCR equipment: The master is receiving packets originating from itself for some seconds and dynamic routing protocols like OSPF or BGP with BFD cause flapping events.
Of course, this should not happen to ethernet devices being soft deactivated; probably a design bug.
It seems cleaner to me to keep bonding interfaces disabled until they are needed in the "on-master" event.
Reading the latest changelog for v7, Mikrotik seems to have added connection tracking synchronization support to VRRP setups.
Maybe it's worth putting this project topic into the v7 beta forum to get more official support.
With regard to dynamic routing: Yes, on_master and on_standby callbacks are fine.
Do you have a suggestion on how to re-enable in "on-master" event only peers and interfaces which have not been soft-deactivated in the master configuration?
It feels like a lot of the issues you are running into on that 8S+ is due to how slow it is to disable the SFP interfaces (as you described). I wonder if we can get them disabled quicker by using :execute to run in the background and try to get all interfaces to be disabled in parallel? I'm not exactly sure how the RouterOS scripting engine is implemented and if it is actually multi-threaded with respect to the state or if there is a global lock for mutation of the state.
For tracking what was disabled/enabled before we mess with it, I have previously injected comments that give me some state information. We could append a comment to ones that we disable (but found enabled by the boot) and then remove that comment when we become master. It is slightly awkward to do substring replacement in RouterOS but we might be able to make a helper function that does this arbitrarily for any configuration that has a disabled and a comment property.
Reading the latest changelog for v7, Mikrotik seems to have added connection tracking synchronization support to VRRP setups.
Maybe it's worth putting this project topic into the v7 beta forum to get more official support.
That does look interesting. To be honest, Mikrotik has offered minimal assistance when we have uncovered bugs that broke ha-mikrotik. At this point, I've just worked around whatever they deliver/break. I'm guessing they will eventually make this project entirely obsolete with some eventual v7 (v8?) enhancement, which is fine by me.
I don't have an 8S+ to test with but I have a 1S+ and 2S+, I'm not seeing disabling taking a ton of time but I am seeing enabling taking something like what you describe. Is this also what you see or are you seeing disabling taking a few seconds as well?
With 8 interfaces, I can definitely see how you would see extra time during a role switch. Do you want to try to integrate the below parallel enable/disable? It will be far more obvious how it behaves on your 8S+ vs. mine.
[admin@X_Inet_HA_A_STANDBY] > [:put [/system clock get time]]; /interface set 0 disabled=yes; [:put [/system clock get time]];
13:39:46
13:39:46
[admin@X_Inet_HA_A_STANDBY] > [:put [/system clock get time]]; /interface set 0 disabled=no; [:put [/system clock get time]];
13:39:49
13:39:51
[admin@X_Inet_HA_A_STANDBY] > [:put [/system clock get time]]; /interface set 0 disabled=yes; [:put [/system clock get time]];
13:39:53
13:39:53
[admin@X_Inet_HA_A_STANDBY] >
Basic test code for parallel disable (or enable):
:foreach k in=[/interface ethernet find] do={:local name [/interface ethernet get $k name]; :execute "/interface ethernet set [find name=\"$name\"] disabled=no"}
:foreach k in=[/interface ethernet find] do={:local name [/interface ethernet get $k name]; :execute "/interface ethernet set [find name=\"$name\"] disabled=yes"}
Thanks for your hint.
It does not seem to work running the interface commands in parallel, or at least there is no difference in execute timing:
[fflo@...CCR01_HA_A_STANDBY] > [:put [/system clock get time]]; :foreach k in=[/interface ethernet find where default-name!="$haInterface" and comment!="HA_RESCUE"] do={:local name [/interface ethernet get $k name]; :execute "/interface ethernet set [find name=\"$name\"] disabled=no"}; [:put [/system clock get time]];
11:30:26
11:30:34
[fflo@...CCR01_HA_A_STANDBY] > [:put [/system clock get time]]; :foreach k in=[/interface ethernet find where default-name!="$haInterface" and comment!="HA_RESCUE"] do={:local name [/interface ethernet get $k name]; :execute "/interface ethernet set [find name=\"$name\"] disabled=yes"}; [:put [/system clock get time]];
11:30:34
11:30:36
[fflo@...CCR01_HA_A_STANDBY] >
[fflo@...CCR01_HA_A_STANDBY] > [:put [/system clock get time]]; /interface ethernet enable [find where default-name!="$haInterface" and comment!="HA_RESCUE"]; [:put [/system clock get time]];
11:30:56
11:31:05
[fflo@...CCR01_HA_A_STANDBY] > [:put [/system clock get time]]; /interface ethernet disable [find where default-name!="$haInterface" and comment!="HA_RESCUE"]; [:put [/system clock get time]];
11:31:05
11:31:06
[fflo@...CCR01_HA_A_STANDBY] >
Thanks for your hint.
It does not seem to work running the interface commands in parallel, or at least there is no difference in execute timing:
[fflo@...CCR01_HA_A_STANDBY] > [:put [/system clock get time]]; :foreach k in=[/interface ethernet find where default-name!="$haInterface" and comment!="HA_RESCUE"] do={:local name [/interface ethernet get $k name]; :execute "/interface ethernet set [find name=\"$name\"] disabled=no"}; [:put [/system clock get time]]; 11:30:26 11:30:34 [fflo@...CCR01_HA_A_STANDBY] > [:put [/system clock get time]]; :foreach k in=[/interface ethernet find where default-name!="$haInterface" and comment!="HA_RESCUE"] do={:local name [/interface ethernet get $k name]; :execute "/interface ethernet set [find name=\"$name\"] disabled=yes"}; [:put [/system clock get time]]; 11:30:34 11:30:36 [fflo@...CCR01_HA_A_STANDBY] > [fflo@...CCR01_HA_A_STANDBY] > [:put [/system clock get time]]; /interface ethernet enable [find where default-name!="$haInterface" and comment!="HA_RESCUE"]; [:put [/system clock get time]]; 11:30:56 11:31:05 [fflo@...CCR01_HA_A_STANDBY] > [:put [/system clock get time]]; /interface ethernet disable [find where default-name!="$haInterface" and comment!="HA_RESCUE"]; [:put [/system clock get time]]; 11:31:05 11:31:06 [fflo@...CCR01_HA_A_STANDBY] >
I can confirm what you are seeing with this code. It seems like the first execute is going to the "background" and then the second one blocks.
Can you try this? I changed it to the brace syntax and added a delay, the delay seems to make a difference to have it going into the "background" (I don't get it). It seems to push these all to the background for me but I haven't confirmed if the interfaces are actually coming up faster. It also prints each name and then the background jobs at the end.
Standby...don't run it. I don't think the $name is propagating correctly in this code.
Follow up to above comment with code that propagates $name correctly (local variables don't appear to propagate inside another :execute block): In my testing...this is backgrounding it and the foreground returns faster but based on what I am seeing in the logs, I don't think the interfaces come up any faster, I think there is a global lock. It will be more obvious with your 8x though.
[:put [/system clock get time]]; /interface ethernet disable [find where default-name!="$haInterface" and comment!="HA_RESCUE"]; [:put [/system clock get time]];
[:put [/system clock get time]]; :foreach k in=[/interface ethernet find where default-name!="$haInterface" and comment!="HA_RESCUE"] do={:local name [/interface ethernet get $k name]; :put $name; :execute "/delay 0.1; /log warning \"start: $name\"; /interface ethernet enable [find name=\"$name\"]; /log warning \"end: $name\""}; [:put [/system clock get time]]; /system script job print
Yes, there is a global lock
01:40:53 system,info,account user fflo logged in from 00:00:5E:00:01:01 via mac-telnet
01:43:31 system,info device changed by fflo
01:43:31 system,info device changed by fflo
01:43:31 system,info device changed by fflo
01:43:31 system,info device changed by fflo
01:43:31 system,info device changed by fflo
01:43:31 system,info device changed by fflo
01:43:31 system,info device changed by fflo
01:43:31 script,warning start: sfp-sfpplus1
01:43:31 script,warning start: sfp-sfpplus2
01:43:31 script,warning start: sfp-sfpplus3
01:43:31 script,warning start: sfp-sfpplus5
01:43:31 script,warning start: sfp-sfpplus4
01:43:31 script,warning start: sfp-sfpplus7
01:43:31 script,warning start: sfp-sfpplus6
01:43:32 system,info device changed by fflo
01:43:32 script,warning end: sfp-sfpplus1
01:43:34 interface,info sfp-sfpplus2 link down
01:43:34 system,info device changed by fflo
01:43:36 interface,info sfp-sfpplus3 link down
01:43:36 script,warning end: sfp-sfpplus2
01:43:36 system,info device changed by fflo
01:43:39 interface,info sfp-sfpplus4 link down
01:43:39 system,info device changed by fflo
01:43:39 script,warning end: sfp-sfpplus3
01:43:39 script,warning end: sfp-sfpplus5
01:43:39 system,info device changed by fflo
01:43:39 script,warning end: sfp-sfpplus4
01:43:39 system,info device changed by fflo
01:43:39 script,warning end: sfp-sfpplus7
01:43:40 system,info device changed by fflo
01:43:40 script,warning end: sfp-sfpplus6
01:43:40 interface,info sfp-sfpplus2 link up (speed 10G, full duplex)
01:43:40 interface,info sfp-sfpplus3 link up (speed 10G, full duplex)
01:43:40 interface,info sfp-sfpplus4 link up (speed 10G, full duplex)
[fflo@...CCR01_HA_B_STANDBY] /interface ethernet>
Yea, this is a bummer. The convergence could be sped up by only bringing up interfaces that are needed. I guess this would be about a 2x speed up for you on the enabling part (only enabling the right 4). I can see why you are interested in finding a solution to keep the interface link up.
Also, just speaking out loud. You have 4 links because it is 2 to an A switch and 2 to a B switch, is that right?
In theory, if we had a nice way, you could bring up 1 link on A and 1 link on B and then bring up the other 2 after everything else has been enabled. LACP should be happy to transparently bring up the other set of links a bit later. This would be 4x speed up.
Trying to keep the interfaces enabled is not lost on me though, just brainstorming.
The configured bonding setup should in theory already starts working with one sfp+ interface connected because only "1 of 4" is set up as a requirement for the bonding interface to start operation.
But due to the nature of IRF LinkAggregation, you are right that both switches should get a dedicated interface up and running a soon as possible because inner IRF routing is suboptimal.
The overall convergence time takes some more seconds because dynamic routing and route announcements of the BGP4 sessions need some time to reset and re-establish on $HASwitchRole or device outage.
To optimize it I have adapted the BGP4 timers to 15s hold-time and 5s keep-alive + configured BGP-BFD.
Not having to wait for enabling the physical member interfaces of a bonding promises a huge performance gain in convergence time.
Do you have a hint on how to find the physical member interfaces of a bonding (only) and enable these interfaces in ha_onbackup state already?
We may actually be on to something with using :execute to get the background enabling of interfaces. This will allow the foreground to continue to proceed with your current methods of enabling the bond and BGP much sooner than it would be if we block for all 8 interfaces.
If you were to interleave your links (sfp1 -> A1, sfp2 -> B1, sfp3 -> A2, sfp4 -> B2) then this would bring up the two minimum ideal interfaces to the switch pairs as fast as we currently can (while the rest of the on_master script is executing in parallel). It requires that you physically lay it out such that this becomes optimal but it doesn't seem too bad of a trade off.
If you do want to pull the slaves, try something like:
[:foreach k in=[/interface bonding get [find name="bond1"] slaves] do={:put $k}]
Furthermore, since we know about the apparent global lock now, there is no point running it in parallel. We can simply do a single :execute to enable the interfaces with a [find] rather than the individual executes and allow the callbacks to run sooner.
ie: ha_onmaster :execute "/interface ethernet enable [find]"
and then using on_master to enable your bonds and BGP.
It is all a bit dirty right now but I feel we might not be far from figuring out a way to do it generically so it works for you and everyone else as it does now.
Hey Nathan,
thanks for your AWESOME project! Wished this issue would get more official support from Mikrotik.
Attached my patches to support bonding interfaces and to fix some issues with HA_VRRP using the new bridge mode.
Tested on 2x CCR1072-1G-8S+ equipment using ether1 as HA-interface using firmware v6.45.8.
-FF