napalm-automation / napalm

Network Automation and Programmability Abstraction Layer with Multivendor support
Apache License 2.0
2.26k stars 554 forks source link

IOS method commit_config running is too slow #879

Closed Laurel-rao closed 5 years ago

Laurel-rao commented 5 years ago

I use 'load_merge_candidate' method spend about 1~3s, but 'commit_config' can spend about 1minutes. That‘s insane. when i use cli-command to change my password, I type every words just spend 20 seconds. Please check this issue.

ktbyers commented 5 years ago

At a minimum you need to post your code, the config change that you are making, and the NAPALM platform that you are using.

Also feel free to dive into the code, and submit a pull-request.

Laurel-rao commented 5 years ago

platform: Cisco ASR1002 Router purpose: change password of device code:

driver = get_network_driver("ios")
dev = driver(host, user, pass)
dev.open()
dev.load_merge_candidate(config="username xxx secret xxx")
dev.commit_config()
dev.close()
vladola commented 5 years ago

So I took some time to dive into this. Here are some of my findings.

With the code as given by @Laurel-rao , it indeed takes on average around 1 minute and 45 seconds:

#> for i in {1..5}; do time /Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python /Users/vlad/work-git-repos/network-orig/network-scripts/telnyx_acl_manager/src/telnyx_acl_manager/testing.py; done;
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.68s user 0.24s system 0% cpu 1:45.60 total
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.81s user 0.38s system 1% cpu 1:46.69 total
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.72s user 0.31s system 0% cpu 1:46.47 total
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.70s user 0.24s system 0% cpu 1:45.35 total
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.88s user 0.41s system 1% cpu 1:45.92 total

The reason for this is generally due to the enabling and disabling (twice) of file prompt quiet. The first time happens when you call the commit_config() method at line 677, which saves the rollback config to the router:

    def _gen_full_path(self, filename, file_system=None):
        """Generate full file path on remote device."""
        if file_system is None:
            return "{}/{}".format(self.dest_file_system, filename)
        else:
            if ":" not in file_system:
                raise ValueError(
                    "Invalid file_system specified: {}".format(file_system)
                )
            return "{}/{}".format(file_system, filename)

    @_file_prompt_quiet
    def _gen_rollback_cfg(self):
        """Save a configuration that can be used for rollback."""
        cfg_file = self._gen_full_path(self.rollback_cfg)
        cmd = "copy running-config {}".format(cfg_file)
        self.device.send_command_expect(cmd)

The second time is when the actual commit happens, when commit_config() calls _commit_handler for the actual commit:

    @_file_prompt_quiet
    def _commit_handler(self, cmd):
        """
        Special handler for hostname change on commit operation. Also handles username removal
        which prompts for confirmation (username removal prompts for each user...)
        """
<SNIP>
        return output

    def commit_config(self, message=""):
        """
        If replacement operation, perform 'configure replace' for the entire config.
        If merge operation, perform copy <file> running-config.
        """
        if message:
            raise NotImplementedError(
                "Commit message not implemented for this platform"
            )
        # Always generate a rollback config on commit
        self._gen_rollback_cfg()

        if self.config_replace:
<SNIP>
        else:
            # Merge operation
            filename = self.merge_cfg
            cfg_file = self._gen_full_path(filename)
            if not self._check_file_exists(cfg_file):
                raise MergeConfigException("Merge source config file does not exist")
            cmd = "copy {} running-config".format(cfg_file)
            output = self._commit_handler(cmd)
            if "Invalid input detected" in output:
                self.rollback()
                err_header = "Configuration merge failed; automatic rollback attempted"
                merge_error = "{0}:\n{1}".format(err_header, output)
                raise MergeConfigException(merge_error)

        # Save config to startup (both replace and merge)
        output += self.device.save_config()

After some tests, if you disable the auto_find_prompt option, the speed improves more than 3x (takes ~30s):

dev = driver('172.21.129.3', 'myuser', 'mypass', optional_args={'auto_file_prompt':False})

Result:

for i in {1..5}; do time /Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python /Users/vlad/work-git-repos/network-orig/network-scripts/telnyx_acl_manager/src/telnyx_acl_manager/testing.py; done;
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.58s user 0.19s system 2% cpu 30.301 total
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.54s user 0.18s system 2% cpu 29.302 total
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.56s user 0.20s system 2% cpu 29.706 total
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.56s user 0.20s system 2% cpu 28.975 total
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.57s user 0.19s system 2% cpu 29.709 total

This is only possible if you enable file prompt quiet by yourself out of the script (otherwise NAPALM will fail as it does check for this command to be present in the running config). Unfortunately there doesn't seem to be a way to send configuration commands via the send_config_set of Netmiko via NAPALM directly (NAPALM implements a cli() method which works only for show commands, if you try it on config commands it will timeout because of not-found prompt).

You can shave off even more from the time by setting the dest_file_system variable yourself, so that NAPALM doesn't have to figure it out (~10s savings):

dev = driver('172.21.129.3', 'myuser', 'mypass', optional_args={'auto_file_prompt':False, 'dest_file_system':'flash:'})

Result:

for i in {1..5}; do time /Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python /Users/vlad/work-git-repos/network-orig/network-scripts/telnyx_acl_manager/src/telnyx_acl_manager/testing.py; done;
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.53s user 0.19s system 3% cpu 22.919 total
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.57s user 0.21s system 3% cpu 23.432 total
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.68s user 0.30s system 4% cpu 23.966 total
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.53s user 0.18s system 3% cpu 23.109 total
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.54s user 0.18s system 3% cpu 23.174 total

But, if the only thing you need to do is configure a bunch of users in the fastest possible time, you should probably use Netmiko directly:

netmiko_driver = helpers.NetmikoDevice('172.21.129.3', fast_cli=True)
netmiko_driver.run_cmd('username xxxdddd secret xxx', delay_factor=0.3) #subclasses send_config_set
netmiko_driver.close()

With the above code it takes ~6 seconds total:

for i in {1..5}; do time /Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python /Users/vlad/work-git-repos/network-orig/network-scripts/telnyx_acl_manager/src/telnyx_acl_manager/testing.py; done;
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.47s user 0.28s system 11% cpu 6.365 total
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.46s user 0.24s system 10% cpu 6.493 total
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.50s user 0.27s system 11% cpu 6.522 total
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.42s user 0.18s system 10% cpu 5.988 total
/Users/vlad/.local/share/virtualenvs/telnyx_acl_manager-09HdqZ6F/bin/python   0.51s user 0.23s system 12% cpu 6.134 total

The problem with the extremely well-tuned code above is that it might or might not work depending on your latency to the device etc (so you'll have to play with the delays). Netmiko might generate some timeout errors if you set the timers too aggressively.

So TL;DR your options are: 1) Disable auto_file_prompt and set manually dest_file_system and this will take a total of ~23 seconds for your config (per device). This assumes you have file prompt quiet configured beforehand somehow. 2) Use netmiko if all you care about is speed and are ok with things breaking a bit more often (depending on how fast your device answers).

In general, without proper APIs screen-scraping is going to be slow, so maybe lower your expectations :).

NAPALM Improvements: 1) Have a persistent @_file_prompt_quiet which can be called only once when a connection is opened and can be removed once a connection is closed, otherwise for each commit operation you will enable/disable the same command 2 times, which takes a significant amount of time as demonstrated by the 3x increase in speeds obtained by simply disabling that option. 2) Allow the user to call Netmiko methods (hence using the same NAPALM connection). Otherwise one is forced to use two different tools for the same job (in some instances).

I might give it a shot at any of these ^.

PseudotsugaMenziesii commented 5 years ago

Thanks for the leg-work on investigating this! We were running into issues with the time it took to send configs to IOS devices. In the end we ended up using the Netmiko method. The good news is that you can already make Netmiko calls with a NAPALM object.

Using the OPs code:

driver = get_network_driver("ios")
dev = driver(host, user, pass)
dev.open()
# Note '.device.' That's how you get access to the Netmiko methods.
dev.device.send_config_set(["username xxx secret xxx",])
dev.device.send_command_expect('wr')
dev.close()

All the cool config merge stuff is skipped, but it really does jam.

vladola commented 5 years ago

Thanks for the leg-work on investigating this! We were running into issues with the time it took to send configs to IOS devices. In the end we ended up using the Netmiko method. The good news is that you can already make Netmiko calls with a NAPALM object.

Using the OPs code:

driver = get_network_driver("ios")
dev = driver(host, user, pass)
dev.open()
# Note '.device.' That's how you get access to the Netmiko methods.
dev.device.send_config_set(["username xxx secret xxx",])
dev.device.send_command_expect('wr')
dev.close()

All the cool config merge stuff is skipped, but it really does jam.

That's neat, didn't think of doing that :). There you go @Laurel-rao that should do the trick.

ktbyers commented 5 years ago

@vladola Thanks for all the work on this...this is great. Yes, as @PseudotsugaMeziesii mentioned we can just use napalm_object.device to get at the underlying Netmiko driver.

On your point 1, I think we should just add an attribute to the napalm object where we track the file_prompt_quiet state. Once we change this once, then we set that flag to True. Then in the napalm close() method, if that file_prompt_quiet flag has been set to True, we re-set it back to no file prompt quiet.

Basically since we are making a configuration change to the device, we should only do it if we need to (i.e. we shouldn't do it in open() as then you would be making this config change even for getter operations). But if we do it for config() operations and then just track the state so we only do it once, that would be a big improvement.

ktbyers commented 5 years ago

@vladola Item #2

Allow the user to call Netmiko methods (hence using the same NAPALM connection). 

This is already the case for both the IOS driver and the NXOS_SSH driver i.e. all interaction with the device is via Netmiko. NAPALM just wraps this to facilitate Netmiko accomplishing things, but Netmiko is doing all the interaction with the device.

ktbyers commented 5 years ago

Also I think we should prioritize reliability over performance in general to a certain standard of reliability (so I probably wouldn't enable fast_cli in general). An end-user can always just set fast_cli=True in the napalm optional_args so they can choose performance over reliability if they want to.

Though for certain operations, like the disable of the prompting and "write mem/save_config", we could possibly selective enable fast_cli there (and maybe other pretty low-risk operations). By low-risk, I mean low likelihood of going wrong from a screen-scraping perspective.

vladola commented 5 years ago

@ktbyers are you gonna work on #1? Otherwise I might give it a shot.

ktbyers commented 5 years ago

@vladola Have at it...that would be great.

vladola commented 5 years ago

Neat. I'll open a PR to track it tomorrow. I think you can close this issue now.

Laurel-rao commented 5 years ago

I am busy that handle company stuff these days.So i reply you suggestion so late, sorry about that. And i am grate for you guys help me fix the problem so quickly, it is helpful.Thanks!

mhprabin commented 4 years ago

I tried to call netmiko command in napalm module but it's throwing error. Any idea please

Traceback (most recent call last): File "arista-mgmt_napalm.py", line 10, in return_dictionary = dev.send_command("sh ip int br") AttributeError: 'EOSDriver' object has no attribute 'send_command'

import json import napalm from napalm import get_network_driver from netmiko import ConnectHandler

driver = napalm.get_network_driver('eos') dev = driver(hostname='10.10.29.1', username='arista',password='arista567') dev.open()

return_dictionary = dev.cli(['show ip interface brief', ])

return_dictionary = dev.send_command("sh ip int br") print return_dictionary dev.close()

ktbyers commented 4 years ago

@mhprabin The EOSDriver uses pyeapi and not Netmiko so you would have to use Netmiko directly here (if you want to use an SSH channel).

mhprabin commented 4 years ago

@ktbyers thank you for the response. so if i want to send multiple config, do i need to call pyeapi module?

with netmiko, i'm able to send multiple command but with NAPALM, i couldn't get that feature. So, found this link to call netmiko within NAPALM but didn't work.

mhprabin commented 4 years ago

@ktbyers i'm able to send my desired config set by calling netmiko within NAPALM object. that's really cool feature.

res = iosvl2.device.send_config_set(['interface g0/1','no sw','ip add 1.1.1.1 255.255.255.0'])

mhprabin commented 4 years ago

I tried same in EOS but seems it's using pyeapi call.

dev.device.run_commands(["show ipv6 bgp summary vrf all", "show ip bgp summary vrf all"]) -->working

commands = ["show running-config | section router bgp"] dev.device.run_commands(commands, encoding="text")[0].get("output", "\n\n") -->working

I am able to send above command but couldn't send command with grep

sh int status | grep Et29

ktbyers commented 4 years ago

@mhprabin In napalm, you would use load_merge_candidate or load_replace_candidate and then commit_config

https://napalm.readthedocs.io/en/latest/tutorials/first_steps_config.html#replacing-the-configuration

So you would just send multiple configuration commands that way.

Let me know if you were asking something different though?

mhprabin commented 4 years ago

Actually I am working on one script where primary server is connected to Router1 and secondary server is connected to Router2. upon request i have to shutdown one interface at Router1 and no shut on Router2. I am able to complete this script but with netmiko it's little bit slow. i tried with fast_cli=True also but this works only in Cisco IOS.

i'm running this script in EOS.Below is my sample script

elif fail_over == 'status':
    id = find_venue_id (ven_name)
    switch1 = venues[id][2]
    ethernet1 = venues[id][3]
    print 'displaying int status ' + ethernet1
    print 'connecting to device ' + switch1
    R01 = {
        'device_type': 'arista_eos',
        'host':   switch1,
        'username': 'prabin',
        'password': 'prabin',
        'secret':'test',
    }
    net_connect = ConnectHandler(**R01)
    net_connect.enable()
    command1 = 'sh int status | grep ' + ethernet1
    output1 = net_connect.send_command(command1)
    print output1

    id = find_venue_id (ven_name)
    switch2 = venues[id][5]
    ethernet2 = venues[id][6]
    print 'connecting to device' + switch2
    R02 = {
        'device_type': 'arista_eos',
        'host':   switch2,
        'username': 'prabin',
        'password': 'prabin,
        'secret':'test',
    }
    net_connect = ConnectHandler(**R02)
    net_connect.enable()
    command = 'sh int status | grep ' + ethernet2
    output = net_connect.send_command(command)
    print output
else:
    print "invalid choice e.g python failover.py NASDAQ S or P "
ktbyers commented 4 years ago

Okay, what is the question/issue?

Really not a NAPALM issue i.e. the question should be moved to Netmiko.

mhprabin commented 4 years ago

sorry for the confusion. Above script is in netmiko and i'm trying to migrate to NAPALM.

with EOS, i'm not able to execute "sh ip int br" command but show clock works.

iosvl2.cli(["sh ip int br"]) Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/napalm/eos/eos.py", line 636, in cli raise CommandErrorException(str(cli_output)) napalm.base.exceptions.CommandErrorException: {u'sh ip int br': u'Invalid command: "sh ip int br"'} iosvl2.cli(["sh clock"]) {u'sh clock': u'Fri Dec 27 00:40:26 2019\nTimezone: UTC\nClock source: local\n'}

mhprabin commented 4 years ago

I tried with full command instead of skipping and finally, it worked. Thank you so much @ktbyers for the response.

iosvl2.cli(['sh ip int br']) Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/napalm/eos/eos.py", line 636, in cli raise CommandErrorException(str(cli_output)) napalm.base.exceptions.CommandErrorException: {u'sh ip int br': u'Invalid command: "sh ip int br"'}

iosvl2.cli(['show ip interface brief ']) {u'show ip interface brief ': u' Address \nInterface IP Address Status Protocol MTU Owner \n--------------- -------------------- ---------- ------------- --------- ------- \nEthernet1 172.16.125.99/24 up up 1500 \nManagement1 10.0.2.15/24 up up 1500 \n\n'}

ktbyers commented 4 years ago

@mhprabin Yes, in general eAPI requires full commands (i.e. you cannot abbreviate commands).