Closed NoifP closed 4 months ago
Adding this delay may help with config imports being more reliable as well. Typically when loading a full config on Mikrotik hardware we add a :delay 15s
at the start of the config to avoid these types of issues.
Hi @NoifP
Have you tried with 5s delay? I feel like it might be enough as well to add in the base launcher sequence for chr. 15s feels a bit too much
Hi @hellt Do you think the approach above is acceptable? If so I can submit a PR after testing 5s.
I haven't tested 5s yet, but I will test and report back. I started with 15s because that is what we've found works reliably with hardware.
Hi @hellt
My testing is not very conclusive unfortunately... It does appear that 6.45.9 is much better with a delay but newer RouterOS doesn't require it. However, if people are assigning config to be loaded the delay could help that be more reliable as well.
Maybe a better approach is to query the device and check if it has ether1 available yet (and loop until it does) although this does introduce more dependency on the RouterOS behavior being consistent.
Maybe the following example (tested on 6.38.5, 6.49.10 and 7.12.2) could be helpful.
[admin@LAB-Switch] > /interface ethernet print count where name=ether1
1
[admin@LAB-Switch] > /interface ethernet print count where name=ether99
0
Here are the test results when testing with 31 nodes (as per the attached yaml file) on a HP DL360p Gen8 (2 x E5-2630 CPUs @ 2.30GHz, 512GB RAM, SSD storage) running Debian 12.
Delay Time | Happy Nodes (got mgmt IP) | Sad Nodes (no mgmt IP) | RouterOS Version | Test # | Max Workers |
---|---|---|---|---|---|
0s | 12 | 19 | 6.45.8 | 2 | default |
5s | 8 | 23 | 6.45.9 | 1 | default |
10s | 29 | 2 | 6.45.9 | 3 | default |
15s | 30 (but 2 didn't import config) | 1 | 6.45.9 | 4 | default |
15s | 31 (but 5 didn't import config) | 0 | 6.45.9 | 5 | 4 |
0s | 31 (but 1 had a kernel failure and didn't import config) | 0 | 6.47.10 | 6 | default |
0s | 31 (but 1 had a kernel failure and didn't import config) | 0 | 6.47.10 | 7 | 4 |
0s | 31 | 0 | 6.49.10 | 8 | 4 |
0s | 31 | 0 | 6.49.10 | 9 | default |
hah, ok it seems there is more to it than just a delay thanks for a thorough testin.
I am happy to accept PRs fixing this, but given that newer versions do not exhibit the issue than I doubt we will find eager contributors jumping on it
I've submitted a PR. This shouldn't slow down anything for happy versions but in my testing it definitely helped on the unhappy versions.
It sends a one liner version of following before adding the management IP which just loops until the router detects ether1.
{
:local ether1count [/interface ethernet find where name=ether1];
:while ([:len $ether1count] < 1) do={
:set ether1count [/interface ethernet find where name=ether1];
}
}
This won't fix the random RouterOS kernel failures or the occasional failure to attempt to import config.auto.rsc
Mikrotik RouterOS 6.45.9 fails to add management IP as ether1 is not detected soon enough. This doesn't happen every time but does happen to around 50% of the v6.45.9 containers when running a lab of 20 nodes. Other versions don't seem to be impacted as frequently. Mikrotik 6.47.10, 6.49.10, 7.11, 7.14.3 generally work fine without modification. This may happen with other firmware versions but I haven't tested.
Docker logs (skipped a bit at the top for brevity)
Failing:
critical lines from above are:
I've modified
/routeros/docker/launch.py
to add:delay 15s;
just before setting the IP address. This is not an elegant fix and may delay startup on mikrotik containers that don't require this hack. Unfortunately I don't have the skills to provide a nicer fix.launch.py
line 163:launch.py
line 163 with dirty hack:Working (with 15s delay):
Note: Mikrotk no longer lists CHR 6.45.9 on their download archive but it is still available via direct link: https://download.mikrotik.com/routeros/6.45.9/chr-6.45.9.vmdk