hellt / vrnetlab

Make VM-based Network OSes run in Containerlab
https://containerlab.dev
MIT License
126 stars 86 forks source link

Mikrotik RouterOS 6.45.9 fails to add management IP as ether1 is not detected soon enough #202

Closed NoifP closed 4 months ago

NoifP commented 4 months ago

Mikrotik RouterOS 6.45.9 fails to add management IP as ether1 is not detected soon enough. This doesn't happen every time but does happen to around 50% of the v6.45.9 containers when running a lab of 20 nodes. Other versions don't seem to be impacted as frequently. Mikrotik 6.47.10, 6.49.10, 7.11, 7.14.3 generally work fine without modification. This may happen with other firmware versions but I haven't tested.

Docker logs (skipped a bit at the top for brevity)

Failing:

2024-06-17 01:42:56,654: vrnetlab   INFO     Unable to connect to qemu monitor (port 5000), retrying in a second (attempt 17)
2024-06-17 01:43:34,706: launch     DEBUG    VM started
2024-06-17 01:43:34,706: vrnetlab   DEBUG    writing to serial console: ''
2024-06-17 01:43:34,707: vrnetlab   TRACE    waiting for 'MikroTik Login: ' on serial console
2024-06-17 01:43:34,712: vrnetlab   TRACE    read from serial console: ': 
MikroTik 6.45.9 (long-term)
MikroTik Login: '
2024-06-17 01:43:34,713: vrnetlab   DEBUG    writing to serial console: 'admin+ct'
2024-06-17 01:43:34,713: vrnetlab   TRACE    waiting for 'Password: ' on serial console
2024-06-17 01:43:34,758: vrnetlab   TRACE    read from serial console: '
MikroTik 6.45.9 (long-term)
MikroTik Login: admin+ct
Password: '
2024-06-17 01:43:34,758: vrnetlab   DEBUG    writing to serial console: ''
2024-06-17 01:43:34,758: vrnetlab   TRACE    waiting for 'Do you want to see the software license? [Y/n]: ' on serial console
2024-06-17 01:43:35,088: vrnetlab   TRACE    read from serial console: '
  MMM      MMM       KKK                          TTTTTTTTTTT      KKK
  MMMM    MMMM       KKK                          TTTTTTTTTTT      KKK
  MMM MMMM MMM  III  KKK  KKK  RRRRRR     OOOOOO      TTT     III  KKK  KKK
  MMM  MM  MMM  III  KKKKK     RRR  RRR  OOO  OOO     TTT     III  KKKKK
  MMM      MMM  III  KKK KKK   RRRRRR    OOO  OOO     TTT     III  KKK KKK
  MMM      MMM  III  KKK  KKK  RRR  RRR   OOOOOO      TTT     III  KKK  KKK
  MikroTik RouterOS 6.45.9 (c) 1999-2020       http://www.mikrotik.com/
Do you want to see the software license? [Y/n]: '
2024-06-17 01:43:35,088: vrnetlab   DEBUG    writing to serial console: 'n'
2024-06-17 01:43:36,089: launch     DEBUG    Login completed
2024-06-17 01:43:36,089: launch     INFO     applying bootstrap configuration
2024-06-17 01:43:36,089: vrnetlab   DEBUG    writing to serial console: '/system identity set name=ros-01'
2024-06-17 01:43:36,090: vrnetlab   TRACE    waiting for '[admin@ros-01] > ' on serial console
2024-06-17 01:43:36,272: vrnetlab   TRACE    read from serial console: '/system identity set name=ros-01
[admin@ros-01] > '
2024-06-17 01:43:36,272: vrnetlab   DEBUG    writing to serial console: '/ip address add interface=ether1 address=172.31.255.30/30'
2024-06-17 01:43:36,272: vrnetlab   TRACE    waiting for '[admin@ros-01] > ' on serial console
2024-06-17 01:43:36,782: vrnetlab   TRACE    read from serial console: '/ip address add interface=ether1 address=172.31.255.30/30
input does not match any value of interface
[admin@ros-01] > '
2024-06-17 01:43:36,782: vrnetlab   DEBUG    writing to serial console: '/user set admin password=admin'
2024-06-17 01:43:36,782: vrnetlab   TRACE    waiting for '[admin@ros-01] > ' on serial console
2024-06-17 01:43:36,850: vrnetlab   TRACE    read from serial console: '/user set admin password=admin
[admin@ros-01] > '
2024-06-17 01:43:36,850: vrnetlab   DEBUG    writing to serial console: ''
2024-06-17 01:43:36,850: launch     INFO     completed bootstrap configuration
2024-06-17 01:43:36,850: launch     INFO     Pushing config via FTP
2024-06-17 01:43:39,926: launch     INFO     FTP attempt #1 failed due to [Errno 113] No route to host
2024-06-17 01:43:39,926: launch     INFO     Trying again
2024-06-17 01:43:42,998: launch     INFO     FTP attempt #2 failed due to [Errno 113] No route to host
2024-06-17 01:43:42,998: launch     INFO     Trying again
2024-06-17 01:43:46,070: launch     INFO     FTP attempt #3 failed due to [Errno 113] No route to host
2024-06-17 01:43:46,070: launch     INFO     Trying again
2024-06-17 01:43:49,142: launch     INFO     FTP attempt #4 failed due to [Errno 113] No route to host
2024-06-17 01:43:49,142: launch     INFO     Trying again
2024-06-17 01:43:52,214: launch     INFO     FTP attempt #5 failed due to [Errno 113] No route to host
2024-06-17 01:43:52,214: launch     INFO     Giving up after 5
2024-06-17 01:43:52,215: launch     INFO     config pushed via FTP
2024-06-17 01:43:52,215: launch     INFO     Startup complete in: 0:01:23.610585

critical lines from above are:

2024-06-17 01:43:36,272: vrnetlab   DEBUG    writing to serial console: '/ip address add interface=ether1 address=172.31.255.30/30'
2024-06-17 01:43:36,272: vrnetlab   TRACE    waiting for '[admin@ros-01] > ' on serial console
2024-06-17 01:43:36,782: vrnetlab   TRACE    read from serial console: '/ip address add interface=ether1 address=172.31.255.30/30
input does not match any value of interface

I've modified /routeros/docker/launch.py to add :delay 15s; just before setting the IP address. This is not an elegant fix and may delay startup on mikrotik containers that don't require this hack. Unfortunately I don't have the skills to provide a nicer fix.

launch.py line 163:

        self.wait_write(
            f"/ip address add interface=ether1 address={ROS_MGMT_ADDR}/{PREFIX_LENGTH}",
            f"[admin@{self.hostname}] > ",

launch.py line 163 with dirty hack:

        self.wait_write(
            f":delay 15s;/ip address add interface=ether1 address={ROS_MGMT_ADDR}/{PREFIX_LENGTH}",
            f"[admin@{self.hostname}] > ",

Working (with 15s delay):

2024-06-17 00:39:42,323: vrnetlab   INFO     Unable to connect to qemu monitor (port 5000), retrying in a second (attempt 20)
2024-06-17 00:40:20,373: launch     DEBUG    VM started
2024-06-17 00:40:20,373: vrnetlab   DEBUG    writing to serial console: ''
2024-06-17 00:40:20,373: vrnetlab   TRACE    waiting for 'MikroTik Login: ' on serial console
2024-06-17 00:40:20,603: vrnetlab   TRACE    read from serial console: ': 
MikroTik 6.45.9 (long-term)
MikroTik Login: '
2024-06-17 00:40:20,603: vrnetlab   DEBUG    writing to serial console: 'admin+ct'
2024-06-17 00:40:20,603: vrnetlab   TRACE    waiting for 'Password: ' on serial console
2024-06-17 00:40:20,650: vrnetlab   TRACE    read from serial console: '
MikroTik 6.45.9 (long-term)
MikroTik Login: admin+ct
Password: '
2024-06-17 00:40:20,650: vrnetlab   DEBUG    writing to serial console: ''
2024-06-17 00:40:20,650: vrnetlab   TRACE    waiting for 'Do you want to see the software license? [Y/n]: ' on serial console
2024-06-17 00:40:20,995: vrnetlab   TRACE    read from serial console: '
  MMM      MMM       KKK                          TTTTTTTTTTT      KKK
  MMMM    MMMM       KKK                          TTTTTTTTTTT      KKK
  MMM MMMM MMM  III  KKK  KKK  RRRRRR     OOOOOO      TTT     III  KKK  KKK
  MMM  MM  MMM  III  KKKKK     RRR  RRR  OOO  OOO     TTT     III  KKKKK
  MMM      MMM  III  KKK KKK   RRRRRR    OOO  OOO     TTT     III  KKK KKK
  MMM      MMM  III  KKK  KKK  RRR  RRR   OOOOOO      TTT     III  KKK  KKK
  MikroTik RouterOS 6.45.9 (c) 1999-2020       http://www.mikrotik.com/
Do you want to see the software license? [Y/n]: '
2024-06-17 00:40:20,995: vrnetlab   DEBUG    writing to serial console: 'n'
2024-06-17 00:40:21,996: launch     DEBUG    Login completed
2024-06-17 00:40:21,996: launch     INFO     applying bootstrap configuration
2024-06-17 00:40:21,996: vrnetlab   DEBUG    writing to serial console: '/system identity set name=ros-01'
2024-06-17 00:40:21,996: vrnetlab   TRACE    waiting for '[admin@ros-01] > ' on serial console
2024-06-17 00:40:22,177: vrnetlab   TRACE    read from serial console: '/system identity set name=ros-01
[admin@ros-01] > '
2024-06-17 00:40:22,177: vrnetlab   DEBUG    writing to serial console: ':delay 15s;/ip address add interface=ether1 address=172.31.255.30/30'
2024-06-17 00:40:22,177: vrnetlab   TRACE    waiting for '[admin@ros-01] > ' on serial console
2024-06-17 00:40:37,195: vrnetlab   TRACE    read from serial console: ':delay 15s;/ip address add interface=ether1 address=172.31.
255.30/30
[admin@ros-01] > '
2024-06-17 00:40:37,195: vrnetlab   DEBUG    writing to serial console: '/user set admin password=admin'
2024-06-17 00:40:37,195: vrnetlab   TRACE    waiting for '[admin@ros-01] > ' on serial console
2024-06-17 00:40:37,238: vrnetlab   TRACE    read from serial console: '/user set admin password=admin
[admin@ros-01] > '
2024-06-17 00:40:37,238: vrnetlab   DEBUG    writing to serial console: ''
2024-06-17 00:40:37,238: launch     INFO     completed bootstrap configuration
2024-06-17 00:40:37,238: launch     INFO     Pushing config via FTP
2024-06-17 00:40:57,198: launch     INFO     config pushed via FTP
2024-06-17 00:40:57,198: launch     INFO     Startup complete in: 0:01:45.923642

Note: Mikrotk no longer lists CHR 6.45.9 on their download archive but it is still available via direct link: https://download.mikrotik.com/routeros/6.45.9/chr-6.45.9.vmdk

NoifP commented 4 months ago

Adding this delay may help with config imports being more reliable as well. Typically when loading a full config on Mikrotik hardware we add a :delay 15s at the start of the config to avoid these types of issues.

hellt commented 4 months ago

Hi @NoifP

Have you tried with 5s delay? I feel like it might be enough as well to add in the base launcher sequence for chr. 15s feels a bit too much

NoifP commented 4 months ago

Hi @hellt Do you think the approach above is acceptable? If so I can submit a PR after testing 5s.

I haven't tested 5s yet, but I will test and report back. I started with 15s because that is what we've found works reliably with hardware.

NoifP commented 4 months ago

Hi @hellt

My testing is not very conclusive unfortunately... It does appear that 6.45.9 is much better with a delay but newer RouterOS doesn't require it. However, if people are assigning config to be loaded the delay could help that be more reliable as well.

Maybe a better approach is to query the device and check if it has ether1 available yet (and loop until it does) although this does introduce more dependency on the RouterOS behavior being consistent.

Maybe the following example (tested on 6.38.5, 6.49.10 and 7.12.2) could be helpful.

[admin@LAB-Switch] > /interface ethernet print count where name=ether1
1
[admin@LAB-Switch] > /interface ethernet print count where name=ether99
0

Here are the test results when testing with 31 nodes (as per the attached yaml file) on a HP DL360p Gen8 (2 x E5-2630 CPUs @ 2.30GHz, 512GB RAM, SSD storage) running Debian 12.

Delay Time Happy Nodes (got mgmt IP) Sad Nodes (no mgmt IP) RouterOS Version Test # Max Workers
0s 12 19 6.45.8 2 default
5s 8 23 6.45.9 1 default
10s 29 2 6.45.9 3 default
15s 30 (but 2 didn't import config) 1 6.45.9 4 default
15s 31 (but 5 didn't import config) 0 6.45.9 5 4
0s 31 (but 1 had a kernel failure and didn't import config) 0 6.47.10 6 default
0s 31 (but 1 had a kernel failure and didn't import config) 0 6.47.10 7 4
0s 31 0 6.49.10 8 4
0s 31 0 6.49.10 9 default

boot-test.clab.yaml.gz

hellt commented 4 months ago

hah, ok it seems there is more to it than just a delay thanks for a thorough testin.

I am happy to accept PRs fixing this, but given that newer versions do not exhibit the issue than I doubt we will find eager contributors jumping on it

NoifP commented 4 months ago

I've submitted a PR. This shouldn't slow down anything for happy versions but in my testing it definitely helped on the unhappy versions.

It sends a one liner version of following before adding the management IP which just loops until the router detects ether1.

{
  :local ether1count [/interface ethernet find where name=ether1];
  :while ([:len $ether1count] < 1)  do={ 
    :set ether1count [/interface ethernet find where name=ether1]; 
  }
}

This won't fix the random RouterOS kernel failures or the occasional failure to attempt to import config.auto.rsc