Closed adampav closed 4 years ago
That's unfortunate. I've noticed the same some time ago, but I forgot to submit the report.
Apparently our method is_alive
does more harm than good, as besides checking the state of the SSH connection we also send the NULL byte, which destroys the connection: https://github.com/ktbyers/netmiko/blob/master/netmiko/base_connection.py#L248
Even though this is already tracked under https://github.com/ktbyers/netmiko/issues/568, I believe we should remove this for IOS-XR, as it seems very sensitive to this (other napalm platforms don't seem to be affected, or at least I'm not aware of). CC @ktbyers
@adampav You can have the always_alive: false
for the NAPALM Proxy indeed, or proxy_keep_alive: false
global option: https://docs.saltstack.com/en/develop/ref/configuration/proxy.html#std:conf_proxy-proxy_keep_alive. The difference is that always_alive: false
will instruct the proxy to not attempt keeping the session always alive, while the latter proxy_keep_alive: false
will open the connection which will stay alive till the network device will drop the connection.
@mirceaulinic Can you expand on this? So the Null byte causes the IOS-XR to get in a messed-up state?
Is it because we are in an XML agent context?
Seems strange...but we can definitely do something different (or give an option to is_alive
to actively test the connection versus just querying paramiko).
I thought this was what Secure CRT SSH session keepalive did (send a null byte). I will have to look into that again.
@mirceaulinic The proxy_keep_alive option seems better since it allows for a quick flurry of operations. i can always increase the ssh timeouts on the XRv Many thanks again
Sorry for late reply @ktbyers:
@mirceaulinic Can you expand on this? So the Null byte causes the IOS-XR to get in a messed-up state?
Yes, but I still don't know why.
Is it because we are in an XML agent context?
This is what I suspect.
I will need to investigate this closer to understand what's actually going on there and why. Thanks!
@mirceaulinic No worries...just let me know what you find.
Hello, thanks @mirceaulinic and @ktbyers to provide these nice libs.
I am now using salt with napalm to manage a cisco router which running ios, and I use telnet to connect this device. I have got the same problem. If I use always_alive
option, and do nothing in 1 minute, I will lose this connection. If I make it false, it becomes very slow to contact with my device.
So, what will I do? Could you please give me some advice?
Hi @ktbyers I had a closer look into this, and the NULL byte definitely breaks the connection in the XML context:
>>> i.open()
>>> w = i.get_arp_table()
>>> i.is_alive()
{u'is_alive': True}
>>> i.is_alive()
{u'is_alive': True}
>>> i.is_alive()
{u'is_alive': True}
>>> i.is_alive()
{u'is_alive': True}
>>> i.is_alive()
{u'is_alive': True}
>>> w = i.get_arp_table()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/state/home/mircea/venvs/cf-napalm/local/lib/python2.7/site-packages/napalm/iosxr/iosxr.py", line 1114, in get_arp_table
result_tree = ETREE.fromstring(self.device.make_rpc_call(rpc_command))
File "/state/home/mircea/venvs/cf-napalm/local/lib/python2.7/site-packages/pyIOSXR/iosxr.py", line 151, in make_rpc_call
result = self._execute_rpc(rpc_command)
File "/state/home/mircea/venvs/cf-napalm/local/lib/python2.7/site-packages/pyIOSXR/iosxr.py", line 365, in _execute_rpc
response = self._send_command(xml_rpc_command, delay_factor=delay_factor)
File "/state/home/mircea/venvs/cf-napalm/local/lib/python2.7/site-packages/pyIOSXR/iosxr.py", line 342, in _send_command
if not self._timeout_exceeded(start=start):
File "/state/home/mircea/venvs/cf-napalm/local/lib/python2.7/site-packages/pyIOSXR/iosxr.py", line 190, in _timeout_exceeded
raise TimeoutError(msg, self)
pyIOSXR.exceptions.TimeoutError: Timeout exceeded!
So the is_alive
always returns True (as it is able to send the NULL byte, but the underlying netmiko layer doesn't fail), although the connection is not usable anymore.
Logs:
ss><Status>StatusResolutionRequest</Status><ClientID>0</ClientID><EntryState>0</EntryState><ResolutionRequestCount>1227636</ResolutionRequestCount></Entry></a
DEBUG:netmiko:read_channel:
DEBUG:netmiko:read_channel: rpEntry></ResolutionHistoryDynamic><ResolutionHistoryClient><arpEntry/></ResolutionHistoryClient></Node></NodeTable></ARP></Operational></Get><ResultSummary ErrorCount="0"/></Response>
XML>
DEBUG:netmiko:Sending the NULL byte
DEBUG:netmiko:write_channel:
DEBUG:netmiko:Sending the NULL byte
DEBUG:netmiko:write_channel:
DEBUG:netmiko:Sending the NULL byte
DEBUG:netmiko:write_channel:
DEBUG:netmiko:Sending the NULL byte
DEBUG:netmiko:write_channel:
DEBUG:netmiko:Sending the NULL byte
DEBUG:netmiko:write_channel:
DEBUG:netmiko:read_channel:
DEBUG:netmiko:write_channel: <?xml version="1.0" encoding="UTF-8"?><Request MajorVersion="1" MinorVersion="0"><Get><Operational><ARP></ARP></Operational></Get></Request>
DEBUG:netmiko:read_channel:
DEBUG:netmiko:read_channel:
DEBUG:netmiko:read_channel:
DEBUG:netmiko:read_channel:
DEBUG:netmiko:read_channel:
DEBUG:netmiko:read_channel:
DEBUG:netmiko:read_channel:
...
~~~ many other read_channel ~~~
...
DEBUG:netmiko:read_channel:
DEBUG:netmiko:write_channel:
DEBUG:netmiko:read_channel:
DEBUG:netmiko:read_channel:
I am not sure what shall we send instead, or if there's anything we can send at all. What about '\n'?
If I use always_alive option, and do nothing in 1 minute, I will lose this connection. If I make it false, it becomes very slow to contact with my device.
@noobcoderT If you turn off the always_alive
option, Salt will not attempt to keep the connection alive anymore, thus it will start a new SSH connection for each command you execute. So that sounds fine from this perspective (as establishing a connection is pretty heavy).
Thanks for your reply @mirceaulinic .
I'm trying to use salt.proxy.napalm module in salt python api. I started a proxy daemon for a network device, and I got 'True' when using salt.proxy.napalm.alive(opts) function even before I used the init() function, and the get_device() func returned an empty dictionary. I have read the doc, but I don't know how to get the __proxy__
variable.
__proxy__['napalm.call']('cli'
**{
'commands': [
'show version',
'show chassis fan'
]
})
Now I am confused how can I use this proxy if I start it as a daemon but not through python script.
Hi @noobcoderT you don't need to use the __proxy__
object. This is documented only for developers that will potentially extend these capabilities, but not for users.
What are you trying to do, more specifically? To invoke arbitrary NAPALM methods, you can use the napalm.call
execution function: https://docs.saltstack.com/en/latest/ref/modules/all/salt.modules.napalm.html#salt.modules.napalm.call.
In general, the public NAPALM methods are available in the existing execution modules, see https://docs.saltstack.com/en/develop/topics/network_automation/index.html#napalm, so you can execute, e.g., bgp.neighbors
, bgp.config
or net.arp
and so on. Is this what you meant?
@mirceaulinic Responding to your comment, have you tried to use '\n' or does that break the XML Agent also?
Hi @mirceaulinic , I want to use this napalm module in a python script, but not in salt cli. I want to use salt LocalClient to handle all proxy minions that running as daemon services, but I don't know all these IDs. So I want to know if there is a way to let me get the proxy objects and then operate them. Thanks!
@ktbyers
Responding to your comment, have you tried to use '\n' or does that break the XML Agent also?
From what I noticed, it doesn't seem to break anything. It actually doesn't do anything either (i.e., it doesn't move to the next line, or display again the prompt).
@noobcoderT:
I want to use this napalm module in a python script, but not in salt cli. I want to use salt LocalClient to handle all proxy minions that running as daemon services, but I don't know all these IDs.
If your Proxy processes are already running, it's pretty easy:
>>> import salt.client
>>> client = salt.client.get_local_client('/etc/salt/master')
>>> ret = client.cmd(tgt, fun, arg, timeout, tgt_type, ret, jid, kwarg, **kwargs)
The arguments you can send to the cmd
function are documented at https://docs.saltstack.com/en/latest/ref/clients/#salt.client.LocalClient.cmd, e.g.,
>>> ret = client.cmd('device1', 'test.ping')
>>> ret
{'device1': True}
>>> ret = client.cmd('device* and G@os:junos and G@model:MX960', 'probes.results', tgt_type='compound')
>>> ret = client.cmd('juniper-routers', 'net.lldp', tgt_type='nodegroup')
But this requires your Proxy processes to be already started. You can equally write a Python script without pre-starting them, but that's slightly more complicated, as you'll basically need to do the Proxy startup, a lighter version of this section https://github.com/saltstack/salt/blob/v2017.7.2/salt/minion.py#L3105-L3174 thus without starting Beacons or the Scheduler.
@mirceaulinic That's great, really big help. Now I know what I should do. Thanks a lot!
@mirceaulinic I wonder if it breaks if we are not in XML Agent context (especially the null-byte). I guess we could always enter/exit out of XML Agent if null-byte works in normal SSH session.
i.e. just do a little wrapper that could check for, enter, exit XML Agent.
It would make things slower though...
Hello, i noticed this problem while experimenting on an XRv 9000. After approximately one minute of normal operation the proxy minion is not able to interact with the specific device. I tested the same on an IOSXE device and i didn't face such problems. I attach some DEBUG log lines 2017-12-02 13:28:44,563 [salt.utils.lazy ][DEBUG ][20954] LazyLoaded status.proxy_reconnect 2017-12-02 13:28:44,564 [netmiko ][DEBUG ][20954] Sending the NULL byte 2017-12-02 13:28:44,564 [netmiko ][DEBUG ][20954] write_channel: 2017-12-02 13:28:44,564 [/usr/lib/python2.7/dist-packages/salt/proxy/napalm.pyc ][DEBUG ][20954] Is xrv1 still alive? Yes.
As soon as those lines above appear i am not able to interact with the device.
2017-12-02 13:28:53,163 [salt.minion ][INFO ][20954] Starting a new job with PID 20954 2017-12-02 13:28:53,364 [netmiko ][DEBUG ][20954] read_channel: 2017-12-02 13:28:53,365 [netmiko ][DEBUG ][20954] write_channel: <?xml version="1.0" encoding="UTF-8"?>show ver
2017-12-02 13:28:53,365 [netmiko ][DEBUG ][20954] read_channel:
2017-12-02 13:28:53,566 [netmiko ][DEBUG ][20954] read_channel:
2017-12-02 13:28:53,766 [netmiko ][DEBUG ][20954] read_channel:
workaround as suggested by @mirceaulinic: i have set the always_connected flag to false.