ktbyers / netmiko

Multi-vendor library to simplify Paramiko SSH connections to network devices
MIT License
3.62k stars 1.31k forks source link

Mikrotik RouterOS - file_transfer to device fails verification (not waiting long enough) #3292

Open jnikolich opened 1 year ago

jnikolich commented 1 year ago

Environment Details

Host: Fedora 38 Netmiko: 4.1.1 (locally modified, see below) Networking Device: Mikrotik RB5009 Router Networking OS: RouterOS 7.12beta3

Problem Description: file_transfer to Mikrotik RouterOS device fails verification (not waiting long enough)

(Please note - due to issue #3291, this problem was tested utilizing a locally-modified version of Netmiko 4.11 that removes flash/ from the destination filename).

I am in the process of testing netmiko's file_transfer() against a Mikrotik RB5009 router, running RouterOS 7.12beta3. When transferring an already-exiisting file, the transfer completes successfully. However when transferring a new file to the RB5009, the transfer fails with the following traceback:

Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/netmiko/mikrotik/mikrotik_ssh.py", line 250, in remote_file_size
    size = remote_out.split("size=")[1].split(" ")[0]
           ~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/data/src/mikrotik/rb5009/./testscript.py", line 146, in <module>
    sys.exit( main( ) )
              ^^^^^^^
  File "/var/data/src/mikrotik/rb5009/./testscript.py", line 129, in main
    ROS.devicePushScript( devHdl, 'testfile.rsc', '/var/data/src/mikrotik/rb5009/assets', [ '/interface print' ] )
  File "/var/data/src/mikrotik/mikrotiklib.py", line 113, in devicePushScript
    transfer_dict = file_transfer(
                    ^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/netmiko/scp_functions.py", line 152, in file_transfer
    if scp_transfer.verify_file():
       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/netmiko/mikrotik/mikrotik_ssh.py", line 285, in verify_file
    self.remote_file_size(remote_file=self.dest_file)
  File "/usr/lib/python3.11/site-packages/netmiko/mikrotik/mikrotik_ssh.py", line 253, in remote_file_size
    raise ValueError("Unable to find file on remote system")
ValueError: Unable to find file on remote system

In file 'mikrotik_ssh.py', function remote_file_size() issues a command to the target device similar to /file print detail where name="testfile.rsc". This fails - no output seems to be returned.

The notes/warnings in Mikrotik's Wiki include the following note as of 2023-09-14:

Note: For multicore devices with a NAND flash memory (e.g. CCR series routers, RB4011iGS), RouterOS uses a write-back which will cache file changes into RAM memory instead of writing them straight away into flash media. The file changes will be stored on the flash when it is absolutely necessary, the writing can be delayed by up to 40 seconds. This helps to reduce CPU cycles which results in better performance. However, this can cause empty or zero-length files when a device experience a sudden power loss, because files were not fully saved on a flash.

The RB5009 is a multicore device with 1GB of NAND storage.

On a hunch, I tried inserting a 5-second delay after the file is transferred, but before remote_file_size() retrieves the file details. This worked. Further testing is probably required, but it appears that when pushing a file to (at least) an RB5009 router, the file details may take a few seconds to become available for query.

Change Request

I'm not sure what the best approach to handling this would be from netmiko's perspective, but a simplistic fix would be to introduce support for configurable (possibly defaulting to zero) post-transfer delay for Mikrotik ROS devices, to occur immediately after the file_transfer has occurred, before any post-transfer verification/reporting occurs.

jnikolich commented 1 year ago

For what it's worth, dropping that 5-second delay down to 1 second also seems to consistently work for me. 0.5 secs consistently failed. 0.9secs intermittently failed. The router itself is very lightly loaded during these tests, averaging 4-5% CPU utilization throughout.