Closed horseinthesky closed 3 years ago
Hey @horseinthesky thanks for opening this!
Could you post the full log file as well please?
My immediate thought is this is a weird difference: ServiceType=snetconf
vs. ServiceType=**
-- I'm wondering if the Huawei box wants to connect on port 830 instead of 22? Or maybe just because auth fails it doesn't generate the "service type"... Just wild guessing as I have no idea :)
Full logs maybe will be helpful -- we also may need to enable the asyncssh logging, but before going that far would it be possible for you to try with the other transports -- system and ssh2 or paramiko -- just to see if the same issue happens there.
Thanks!
Carl
@carlmontanari Hey. It's using 22 (I would not see anything in case of 830 cuz of my FW).
Logs are not so informative =(
INFO:scrapli.driver-<mysupersecrethostname>:Non-core transport `asyncssh` selected
INFO:scrapli.helper:found ssh config file at `/home/horseinthesky/.ssh/config`
INFO:scrapli.driver-<mysupersecrethostname>:Opening connection to <mysupersecrethostname>
INFO:asyncssh:Opening SSH connection to <mysupersecretbastionhostname>, port 22
INFO:asyncssh:[conn=0] Connection to <mysupersecretbastionhostname>, port 22 succeeded
INFO:asyncssh:[conn=0] Local address: <mysupersecretip>, port 65467
INFO:asyncssh:[conn=0] Beginning auth for user horseinthesky
INFO:asyncssh:[conn=0] Auth for user horseinthesky succeeded
INFO:asyncssh:[conn=0] Opening SSH connection to <mysupersecretproxyhostname>, port 22 via bastion
INFO:asyncssh:[conn=0] Opening direct TCP connection to <mysupersecretproxyhostname>, port 22
INFO:asyncssh:[conn=0] Client address: dynamic port
INFO:asyncssh:[conn=1] Connection to <mysupersecretproxyhostname>, port 22 succeeded
INFO:asyncssh:[conn=1] Local address: <mysupersecretip>, port 65467
INFO:asyncssh:[conn=1] Beginning auth for user horseinthesky
INFO:asyncssh:[conn=1] Auth for user horseinthesky succeeded
INFO:asyncssh:[conn=1] Opening SSH connection to <mysupersecrethostname>, port 22 via csas
INFO:asyncssh:[conn=1] Opening direct TCP connection to <mysupersecrethostname>, port 22
INFO:asyncssh:[conn=1] Client address: dynamic port
INFO:asyncssh:[conn=2] Connection to <mysupersecrethostname>, port 22 succeeded
INFO:asyncssh:[conn=2] Local address: <mysupersecretip>, port 65467
ERROR:scrapli.transport-<mysupersecrethostname>:Private key authentication with host <mysupersecrethostname> failed. Authentication Timed Out.
Traceback (most recent call last):
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_asyncssh/transport/asyncssh_.py", line 253, in _authenticate_private_key
self.session = await asyncio.wait_for(
File "/usr/lib/python3.8/asyncio/tasks.py", line 490, in wait_for
raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError
I'm not sure why but it is using my SSH config file regardless "ssh_config_file": True/False.
Gotcha, figured it was worth a shot to throw that out there :)
I'm not sure why but it is using my SSH config file regardless "ssh_config_file": True/False.
^ this is because asyncssh does this natively and I haven't gotten around to making it not do that heh 🙃
If you could give system/ssh2 a shot and also if possible testing with a password instead just to help try to narrow things down that would be super cool.
Thanks for helping work through this!!
Carl
Ok.
So async version with password has pretty much the same exception but mentions that password auth was used:
ERROR:scrapli.transport-<mysupersecrethostname>:Password authentication with host <mysupersecrethostname> failed. Authentication Timed Out.
Traceback (most recent call last):
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_asyncssh/transport/asyncssh_.py", line 294, in _authenticate_password
self.session = await asyncio.wait_for(
File "/usr/lib/python3.8/asyncio/tasks.py", line 490, in wait_for
raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError
Sync version with system
transport is able to connect but crashes on some vendor restrictions:
Traceback (most recent call last):
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/transport/ptyprocess.py", line 395, in read
s = self.fileobj.read1(size)
OSError: [Errno 5] Input/output error
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "o_check.py", line 55, in <module>
main()
File "o_check.py", line 43, in main
response = conn.get(filter_=power_rpc, filter_type="subtree")
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_netconf/driver/driver.py", line 97, in get
raw_response = self.channel.send_input_netconf(response.channel_input)
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_netconf/channel/channel.py", line 216, in send_input_netconf
raw_result = self._read_until_prompt(output=raw_result)
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/channel/channel.py", line 124, in _read_until_prompt
output += self._read_chunk()
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/channel/channel.py", line 49, in _read_chunk
new_output = self.transport.read()
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/decorators.py", line 313, in requires_open_session_wrapper
return wrapped_func(*args, **kwargs)
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/decorators.py", line 113, in decorate
return self.multiprocessing_timeout(
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/decorators.py", line 286, in multiprocessing_timeout
result = future.get(timeout=self.timeout_duration)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/transport/systemssh.py", line 517, in read
return self.session.read(read_bytes)
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/transport/ptyprocess.py", line 400, in read
raise EOFError("End Of File (EOF). Exception style platform.")
EOFError: End Of File (EOF). Exception style platform.
Log ends with:
INFO:scrapli.driver-<mysupersecrethostname>:Connection to <mysupersecrethostname> opened successfully
INFO:scrapli.channel-<mysupersecrethostname>:Attempting to send input: #451
<?xml version='1.0' encoding='utf-8'?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">
<powerSupplys>
<powerSupply>
<position/>
<powerEnvironments>
<powerEnvironment/>
</powerEnvironments>
</powerSupply>
</powerSupplys>
</devm></filter></get></rpc>
##; strip_prompt: False
INFO:scrapli.channel-<mysupersecrethostname>:Read: b'\n#451\n<?xml version=\'1.0\' encoding=\'utf-8\'?>\n<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">\n <powerSupplys>\n <powerSupply>\n <position/>\n <powerEnvironments>\n <powerEnvironment/>\n </powerEnvironments>\n </powerSupply>\n </powerSupplys>\n </devm></filter></get></rpc>\n##'
P.S. Sync version with ssh2
failes with:
Traceback (most recent call last):
File "o_check.py", line 55, in <module>
main()
File "o_check.py", line 37, in main
conn.open()
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_netconf/driver/driver.py", line 64, in open
login_bytes = self.transport.open_netconf()
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_netconf/transport/cssh2.py", line 26, in open_netconf
super().open()
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_ssh2/transport/cssh2.py", line 136, in open
self.socket.socket_open()
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/transport/socket.py", line 101, in socket_open
self.sock.connect((self.host, self.port))
socket.gaierror: [Errno -5] No address associated with hostname
some vendor restrictions
<- what does this mean?
EOFError: End Of File (EOF). Exception style platform.
seems like the device is just punting us out, perhaps inline with the above vendor restriction comment?
The ssh2
issue seems like it just cant resolve the name... guess we can just ignore that for now anyway though... one thing at a time!
Could you connect to this device manually in a terminal and snag all the output? That could be our best bet to figure out what's going on. I've got these notes to connect and run commands as scrapli does. In theory the get
below should work as its just openconfig model but obviously that may not be true for your platform -- but you could of course replace that w/ whatever you want!
ssh 172.18.0.13 -p 22 -o ConnectTimeout=5 -o ServerAliveInterval=10 -l vrnetlab -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -F /dev/null -s netconf
<?xml version="1.0" encoding="utf-8"?>
<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<capabilities>
<capability>urn:ietf:params:netconf:base:1.1</capability>
</capabilities>
</hello>]]>]]>
#396
<?xml version='1.0' encoding='utf-8'?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101">
<get>
<filter type="subtree">
<components xmlns="http://openconfig.net/yang/platform">
<component>
<state>
</state>
</component>
</components>
</filter>
</get>
</rpc>
##
EDIT PS: the chunk size is assuming spaces not tabs so if you copy/paste out of here just be aware that if your editor or terminal or whatever decides to use tabs not spaces the chunk size will be wonky!
some vendor restrictions <- what does this mean?
Sorry for this. I mean you have "Supported platforms" section in the README:
Cisco IOS-XE (tested on: 16.12.03) with Netconf 1.0 and 1.1
Cisco IOS-XR (tested on: 6.5.3) with Netconf 1.1
Juniper JunOS (tested on: 17.3R2.10) with Netconf 1.0
thus I thought this message
EOFError: End Of File (EOF). Exception style platform.
was due to some parsing exception from scrapli-netconf.
It (CE box) is actually can't do openconfig (It says it can but it is lying =))
Here is manual NETCONF communication: https://justpaste.it/40y8o
And here is ncclient
communication:
https://justpaste.it/6r15p
Ah makes sense!
Perfect, thank you so much! I will take a peak this weekend and hopefully figure something out! Thanks for sticking with this!
Carl
Just a bump to say I haven't forgotten about this... this weekend did not go as planned haha, hopefully after work this week or next weekend I will be able not dig into this in more detail!
Hey @horseinthesky sorry for the big delay here.
I'm not seeing anything super obvious as to why this wouldn't work with huawei and of course I have nothing to test with which is a bummer! I did just make a push to develop earlier today that fixed some issues that cropped up w/ system transport.... while I think this is unrelated, it may be worth giving develop a shot to see if it maybe gives us any new info to work with.
I've got a usg6k image I will try to get booted up (I feel like I started to try this before and it did not go well, but we'll see!) and maybe that supports netconf so I can try it out (I have no idea if it does).
Let us know if the develop branch changes anything, and I'll let ya know if I am able to get a test box up and running.
Carl
Thank you. If I can check/test something on CE88XX/68XX Huawei boxes which will help just ask since I don't quite understand how scrapli works :P
Btw I wrote a simple wrapper around asyncssh to be able to work with NETCONF messages and had no Auth issue in this case.
Going to try to make some clean up in scrapli_asyncssh transport today to address the issue where it always uses the private key/config file if it exists and stuff like that -- I think this is the main issue here. Is this the script you were having success w/ with the huawei boxes? If so I will compare to make sure I'm not doing something stupid somewhere :)
In the meantime if you could try w/ develop branch and system transport to see if we get any further than last time that would be really helpful. Thanks a bunch for all the help!!
Carl
Is this the script you were having success w/ with the huawei boxes?
Yes. This is something I suppose should work =)
In the meantime if you could try w/ develop branch and system transport to see if we get any further than last time that would be really helpful.
Could you pls say where should I look to make it work?
Ah sorry -- you can install the develop branch like so: pip install -e git+https://github.com/scrapli/scrapli_netconf.git@develop#egg=scrapli_netconf
I'm not too hopeful that will fix things but figure it is worth a shot :)
OH.... I am apparently blind until just now. In the manual communication link you shared (https://justpaste.it/40y8o) it looks like we are using netconf 1.0 not netconf 1.1.... there is a netconf 1.1 capability in the servers listed capabilities though... so scrapli is definitely using netconf 1.1 encoding which will definitely not work... seems like that may be our big problem. I'm sorry I am just now noticing this... that was a big miss!
You could test to see if this is the issue by doing something like this:
import logging
from scrapli_netconf.driver import NetconfScrape
from scrapli_netconf.constants import NetconfVersion
logging.basicConfig(filename="scrapli.log", level=logging.DEBUG)
logger = logging.getLogger("scrapli")
IOSXR_DEVICE = {
"host": "localhost",
"auth_username": "vrnetlab",
"auth_password": "VR-netlab9",
"auth_strict_key": False,
"port": 23830,
"transport": "system"
}
conn = NetconfScrape(**IOSXR_DEVICE)
conn.open()
conn.netconf_version = NetconfVersion.VERSION_1_0
result = conn.get_config()
The "hello" stuff should just work because it always uses the ]]>]]>
delimiter -- but after the hello is exchanged we send either 1.0 or 1.1 encoded messages -- if the 1.1 capability shows up we always use 1.1 encoding... but based on the output you shared this may be a bad idea :) In the above snippet we import the enum I use to set for the netconf version we use, and we set the version to netconf 1.0 prior to running any rpcs (but after open/hello is done).... that might get us working.
Carl
I always use 1.0 mostly because it is quite simple to catch and parse =) Will try to look deeper.
Hah yep, I understand that feeling! Yeah if you can test that develop branch and then setting the version like in the above snippet I think we may be able to get somewhere.
If that does work I can maybe add an attribute like prefer_1_0
or something like that so we can prefer to send 1.0 messages instead of 1.1!
Thanks again for all the help with this!
Hm. It's very strange but async version (the one in my first message) works now :P No changes made. It could be my company's bastion host issue but I'm not sure since I'm not responsible for it.
Now about sync version with system
transport. It seems like it ignores my NetconfVersion.VERSION_1_0
setting:
import os
import logging
from scrapli_netconf.driver import NetconfScrape
from scrapli_netconf.constants import NetconfVersion
logging.basicConfig(filename="scrapli.log", level=logging.INFO)
logger = logging.getLogger("scrapli")
lab = '<mysupersecrethostname>'
host = lab
DEVICE = {
"host": lab,
"auth_username": f"{os.getenv('USER')}",
"auth_private_key": f"{os.getenv('HOME')}/.ssh/id_rsa",
"auth_strict_key": False,
"ssh_config_file": False,
"transport": "system",
"port": 22,
}
power_rpc = '''
<devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">
<powerSupplys>
<powerSupply>
<position></position>
<powerEnvironments>
<powerEnvironment></powerEnvironment>
</powerEnvironments>
</powerSupply>
</powerSupplys>
</devm>
'''
def main():
# create scrapli_netconf connection just like with scrapli, open the connection
conn = NetconfScrape(**DEVICE)
conn.open()
conn.netconf_version = NetconfVersion.VERSION_1_0
response = conn.get(filter_=power_rpc, filter_type="subtree")
print(response.result)
# close the session
conn.close()
if __name__ == "__main__":
main()
Stil fails with:
Traceback (most recent call last):
File "o_check.py", line 54, in <module>
main()
File "o_check.py", line 46, in main
response = conn.get(filter_=power_rpc, filter_type="subtree")
File "/home/horseinthesky/scripts/scr/.venv/src/scrapli-netconf/scrapli_netconf/driver/driver.py", line 97, in get
raw_response = self.channel.send_input_netconf(response.channel_input)
File "/home/horseinthesky/scripts/scr/.venv/src/scrapli-netconf/scrapli_netconf/channel/channel.py", line 214, in send_input_netconf
raw_result = self._read_until_prompt(output=raw_result)
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/channel/channel.py", line 124, in _read_until_prompt
output += self._read_chunk()
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/channel/channel.py", line 49, in _read_chunk
new_output = self.transport.read()
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/decorators.py", line 313, in requires_open_session_wrapper
return wrapped_func(*args, **kwargs)
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/decorators.py", line 113, in decorate
return self.multiprocessing_timeout(
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/decorators.py", line 286, in multiprocessing_timeout
result = future.get(timeout=self.timeout_duration)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/transport/systemssh.py", line 517, in read
return self.session.read(read_bytes)
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/transport/ptyprocess.py", line 400, in read
raise EOFError("End Of File (EOF). Exception style platform.")
EOFError: End Of File (EOF). Exception style platform.
Last message of the log (if I understand it correctly it shows the last message on the channel which is what it sent to the device) show 1.1 notation:
INFO:scrapli.channel-<mysupersecrethostname>:Sending client capabilities
INFO:scrapli.channel-<mysupersecrethostname>:Read: b'\n<?xml version="1.0" encoding="utf-8"?>\n <hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">\n <capabilities>\n <capability>urn:ietf:params:netconf:base:1.1</capability>\n </capabilities>\n</hello>]]>]]>'
INFO:scrapli.driver-<mysupersecrethostname>:Connection to <mysupersecrethostname> opened successfully
INFO:scrapli.channel-<mysupersecrethostname>:Attempting to send input: #458
<?xml version='1.0' encoding='utf-8'?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">
<powerSupplys>
<powerSupply>
<position/>
<powerEnvironments>
<powerEnvironment/>
</powerEnvironments>
</powerSupply>
</powerSupplys>
</devm></filter></get></rpc>
]]>]]>
##; strip_prompt: False
INFO:scrapli.channel-<mysupersecrethostname>:Read: b'\n#458\n<?xml version=\'1.0\' encoding=\'utf-8\'?>\n<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">\n <powerSupplys>\n <powerSupply>\n <position/>\n <powerEnvironments>\n <powerEnvironment/>\n </powerEnvironments>\n </powerSupply>\n </powerSupplys>\n </devm></filter></get></rpc>\n]]>]]>\n##'
I've checked line 400 in scrapli/transport/ptyprocess.py
but have no idea what is going on there =)
Ah!
Ok well if the first thing worked (without manually setting the netconf version manually or anything) I wonder if the system transport one will work w/out setting the transport?
It looks like I lied to you also, sorry! Will need to set the version like:
conn.netconf_version = NetconfVersion.VERSION_1_0
conn.channel.netconf_version = NetconfVersion.VERSION_1_0
^ it needs to get set in the driver and the channel for some reason. I should probably make that better at some point :p
ptyprocess
stuff is a bit of dark magic vendor'd and tidied up from ptyprocess -- the EOF just means that the device doesn't like what we sent and closed the connection on us... I should also make that exception more clear in scrapli core :)
So I guess we have to try two things now:
1) trying system transport without setting the version to see if that works now (since the async one works I would think this would work too 2) if 1 does not work, can try setting the netconf version in both the driver and the channel (sorry again for not getting that to you correctly before!)
Feels like we are getting closer to resolution :D
Carl
Regarding async
version I wonder why it is using 1.1 and it is successful?
INFO:scrapli.channel-<mysupersecrethostname>:Attempting to send input: #451
<?xml version='1.0' encoding='utf-8'?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">
<powerSupplys>
<powerSupply>
<position/>
<powerEnvironments>
<powerEnvironment/>
</powerEnvironments>
</powerSupply>
</powerSupplys>
</devm></filter></get></rpc>
##; strip_prompt: False
INFO:scrapli.channel-<mysupersecrethostname>:Read: b'\n#1381\n<?xml version="1.0" encoding="UTF-8"?>\n<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">\n <data>\n <devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">\n <powerSupplys>\n <powerSupply>\n <position>1/3</position>\n <entIndex>16847872</entIndex>\n <powerEnvironments>\n <powerEnvironment>\n <pemIndex>16847872</pemIndex>\n <state>supply</state>\n <voltageValue>12.2</voltageValue>\n <electricalValue>5.1</electricalValue>\n <temperValue>N/A</temperValue>\n <actualPower>62</actualPower>\n <ratedPower>600</ratedPower>\n </powerEnvironment>\n </powerEnvironments>\n </powerSupply>\n <powerSupply>\n <position>1/4</position>\n <entIndex>16848128</entIndex>\n <powerEnvironments>\n <powerEnvironment>\n <pemIndex>16848128</pemIndex>\n <state>supply</state>\n <voltageValue>12.2</voltageValue>\n <electricalValue>6.5</electricalValue>\n <temperValue>N/A</temperValue>\n <actualPower>79</actualPower>\n <ratedPower>600</ratedPower>\n </powerEnvironment>\n </powerEnvironments>\n </powerSupply>\n </powerSupplys>\n </devm>\n </data>\n</rpc-reply>\n##\n'
INFO:scrapli.driver-<mysupersecrethostname>:Closing connection to <mysupersecrethostname>
INFO:asyncssh:[conn=2] Closing connection
And btw how I can change timeout? It sometimes gets:
scrapli.exceptions.ScrapliTimeout: Private key authentication with host <mysupersecrethostname> failed. Authentication Timed Out.
If I use
conn.netconf_version = NetconfVersion.VERSION_1_0
conn.channel.netconf_version = NetconfVersion.VERSION_1_0
it stucks (Ctrl+C is the only solution). Logs:
INFO:scrapli.driver-<mysupersecrethostname>:Connection to <mysupersecrethostname> opened successfully
INFO:scrapli.channel-<mysupersecrethostname>:Attempting to send input: <?xml version='1.0' encoding='utf-8'?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">
<powerSupplys>
<powerSupply>
<position/>
<powerEnvironments>
<powerEnvironment/>
</powerEnvironments>
</powerSupply>
</powerSupplys>
</devm></filter></get></rpc>
]]>]]>; strip_prompt: False
INFO:asyncssh:[conn=2, chan=0] Received channel close
INFO:asyncssh:[conn=2, chan=0] Channel closed
INFO:asyncssh:[conn=1, chan=0] Aborting channel
INFO:asyncssh:[conn=2] Connection lost
INFO:asyncssh:[conn=1] Closing connection
INFO:asyncssh:[conn=1, chan=0] Closing channel
INFO:asyncssh:[conn=1] Sending disconnect: Disconnected by application (11)
INFO:asyncssh:[conn=0, chan=0] Aborting channel
INFO:asyncssh:[conn=1] Connection closed
INFO:asyncssh:[conn=1, chan=0] Closing channel due to connection close
INFO:asyncssh:[conn=1, chan=0] Channel closed
INFO:asyncssh:[conn=0] Closing connection
INFO:asyncssh:[conn=0, chan=0] Closing channel
INFO:asyncssh:[conn=0] Sending disconnect: Disconnected by application (11)
INFO:asyncssh:[conn=0] Connection closed
INFO:asyncssh:[conn=0, chan=0] Closing channel due to connection close
INFO:asyncssh:[conn=0, chan=0] Channel closed
Sync version with:
conn.netconf_version = NetconfVersion.VERSION_1_0
conn.channel.netconf_version = NetconfVersion.VERSION_1_0
seems to send 1.0 tags
INFO:scrapli.driver-<mysupersecrethostname>:Connection to <mysupersecrethostname> opened successfully
INFO:scrapli.channel-<mysupersecrethostname>:Attempting to send input: <?xml version='1.0' encoding='utf-8'?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">
<powerSupplys>
<powerSupply>
<position/>
<powerEnvironments>
<powerEnvironment/>
</powerEnvironments>
</powerSupply>
</powerSupplys>
</devm></filter></get></rpc>
]]>]]>; strip_prompt: False
INFO:scrapli.channel-<mysupersecrethostname>:Read: b'\n<?xml version=\'1.0\' encoding=\'utf-8\'?>\n<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">\n <powerSupplys>\n <powerSupply>\n <position/>\n <powerEnvironments>\n <powerEnvironment/>\n </powerEnvironments>\n </powerSupply>\n </powerSupplys>\n </devm></filter></get></rpc>\n]]>]]>'
but still gets:
EOFError: End Of File (EOF). Exception style platform.
And btw how I can change timeout? It sometimes gets:
^ you can change the timeout_socket
value -- generally across the scrapli transports this value is used for the literal socket that underpins things like paramiko/ssh2, or is used for the "initial connection" type timeout for things like asyncssh/system where there is no direct socket we have control over.
Regarding the 1.0 vs 1.1 thing -- actually I think these failures make sense, and it makes sense why it fails if we change the version after open... during the open phase we pick what capabilities we send in our hello based on the capabilities advertised by the device. So if they advertise 1.1 support, we always send the 1.1 hello. You can see the hello options here:
class NetconfClientCapabilities(Enum):
CAPABILITIES_1_0 = """
<?xml version="1.0" encoding="utf-8"?>
<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<capabilities>
<capability>urn:ietf:params:netconf:base:1.0</capability>
</capabilities>
</hello>]]>]]>"""
CAPABILITIES_1_1 = """
<?xml version="1.0" encoding="utf-8"?>
<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<capabilities>
<capability>urn:ietf:params:netconf:base:1.1</capability>
</capabilities>
</hello>]]>]]>"""
So that EOF error (which will be more helpful in the next release of scrapli!) seems "right" because we told the device we wanted to use 1.1 then we sent a 1.0 message and it blew up.
If you want to use 1.0 payloads, is there a different port/ip that you can connect to? For example, with IOSXE I connect on port 22 and it is 1.0 style, but if I connect on port 830 it is 1.1 style. Not sure if that will fix things for you?
To sum up:
timeout_socket
to control the initial connection timeoutconn.netconf_version
setting in manually? (seems it works on at least asyncssh)prefer_1_0
option in the device constructor that will send 1.0 hellos even if the device advertises 1.1 capabilities? Hopefully that makes sense, let us know what you think!Thanks a bunch!
Carl
1) Where should I use this timeout_socket
?
2) It works with asyncssh
out of the box. And putting netconf_version
breaks it (probably due to the fact we already sent 1.1 capability in our hello).
system
doesn't work in any scenario.
3) In general I think it's a good idea to have 1.0 option.
What I cannot understand is why it (sync version with system
transport) doesn't work by default when we send 1.1 hello.
"host": "<my_device_hostname>",
"auth_username": f"{os.getenv('USER')}",
"auth_private_key": f"{os.getenv('HOME')}/.ssh/id_rsa",
"auth_strict_key": False,
"ssh_config_file": True,
"transport": "asyncssh",
"port": 22,
"timeout_socket": 60
}
^ timeout can be configured int he constructor.
Were you able to try system
transport with the develop branch? I am wondering if this commit will help at all... basically the overall handling of things should be almost identical between asyncssh
and system
so I am def confused as well why system is having a hard time!!
I will try to get a prefer_1_0
option built today/tomrorow to develop branch so you can test it out :D
I've just checked I have mentioned commit but no luck for now.
I have an idea why it doesn't work (Huawei lying again =)). This is the log with putting 1.0 version - we send 1.1 hello.
INFO:scrapli.channel-<mysupersecrethostname>:Sending client capabilities
INFO:scrapli.channel-<mysupersecrethostname>:Read: b'\n<?xml version="1.0" encoding="utf-8"?>\n <hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">\n <capabilities>\n <capability>urn:ietf:params:netconf:base:1.1</capability>\n </capabilities>\n</hello>]]>]]>'
INFO:scrapli.driver-<mysupersecrethostname>:Connection to <mysupersecrethostname> opened successfully
INFO:scrapli.channel-<mysupersecrethostname>:Attempting to send input: <?xml version='1.0' encoding='utf-8'?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">
<powerSupplys>
<powerSupply>
<position/>
<powerEnvironments>
<powerEnvironment/>
</powerEnvironments>
</powerSupply>
</powerSupplys>
</devm></filter></get></rpc>
]]>]]>; strip_prompt: False
INFO:scrapli.channel-<mysupersecrethostname>:Read: b'\n<?xml version=\'1.0\' encoding=\'utf-8\'?>\n<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">\n <powerSupplys>\n <powerSupply>\n <position/>\n <powerEnvironments>\n <powerEnvironment/>\n </powerEnvironments>\n </powerSupply>\n </powerSupplys>\n </devm></filter></get></rpc>\n]]>]]>'
But it can probably be that Huawei just can't parse it (I've never tried 1.1 on it).
With asyncssh (which works) I cannot check what capability we;'ve sent. Log has just:
INFO:scrapli.channel-lab-myt-1ct5.netinfra.cloud.yandex.net:Sending client capabilities
I've we heva 1.0 here I believe this may be a proof Huawei just doesn't work with 1.1 at all.
Ok, sorry for the delay again! Can we do a bit of a reset here to make sure we are all on the same page? I think I have gone down a few unrelated rabbit holes that has not helped things :)
So, can we just get (w/ full logs pretty please!) a get_config (or get whatever) for asyncssh and system transport with no changes to the version or anything. I want to compare those logs and see if/where system is messing up. Given that the normal async script worked no problem now I am wondering if we have just been getting wrapped around the axel on things and missing the real issue on the sync bits.
Sorry this has been hard for me to follow over the holidays and me getting side tracked on things!! Thanks a bunch for your patience!
Carl
It is no problem to work slow on this one. I'm just glad to help to improve such an amazing tool.
So these are full logs from successful async request (small get RPC to get power supply): https://justpaste.it/9rxjj
These are logs from sync (system transport) request (same RPC): https://justpaste.it/5raml
And these are logs from sync (system transport) WITH putting:
conn.netconf_version = NetconfVersion.VERSION_1_0
conn.channel.netconf_version = NetconfVersion.VERSION_1_0
Boom! You rock, will dig into this today again I hope :D
My guess is: Second scenario - sync (system transport) doesn't work with 1.1 because Huawei can't work with 1.1. But I may be wrong cuz I don't know what
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/transport/ptyprocess.py", line 400, in read
raise EOFError("End Of File (EOF). Exception style platform.")
EOFError: End Of File (EOF). Exception style platform.
is about :P
3rd scenario is probably wrong cuz we send the only 1.1 capability and then send RPC with 1.0 wrapping. But it also has this
File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/transport/ptyprocess.py", line 400, in read
raise EOFError("End Of File (EOF). Exception style platform.")
EOFError: End Of File (EOF). Exception style platform.
so I really don't know =)
I've just pushed a change to develop that maybe? will help. I swear NETCONF drives me mad :) Anyway, the change basically removes newlines and "flattens" the payload we send to the device. In my experience it seems some devices dont mind having "pretty" xml sent to the device, but others very much do not like the extra line breaks for some reason. I'm not sure that is the "fix" here, but this is good as I had missed the "get" operation flattening.
I would be curious if, before you upgrade to the latest develop branch if you just ran a "get_config" instead -- does that work? (the input for that should already be flattened and such). If that doesn't work the develop branch probably won't fix this, but at least we have some movement!
Lastly, would it be possible to get those three log files with logging set to DEBUG? DEBUG should catch the hello sent in both sync and async version?
So, this is DEBUG for SYNC get-config (before "flattening" commit): https://justpaste.it/8kpty
New commit didn't change anything.
Here is successful debug level ASYNC request log: https://justpaste.it/3b9po
Here is SYNC get debug level log: https://justpaste.it/7sjl4
And finally same SYNC but with:
conn.netconf_version = NetconfVersion.VERSION_1_0
conn.channel.netconf_version = NetconfVersion.VERSION_1_0
Thanks!
So we get and parse the capabilities and things are the same for sync and async, so I think we are "good" on that front. I think we can ignore the netconf version rabbit hole we started down at this point.
On the sync version I see this in the log that looks "bad" but I don't think its really doing anything to us as we are still getting capabilities and stuff, so maybe we can ignore it?
setsockopt IPV6_TCLASS 8: Operation not permitted
After that/parsing capabilities we send identical payloads to the device w/ sync and async, and we send the return at the same point and everything. So I am thinking we have one of two problems:
1) There is something buggered in the system transport (or perhaps greater sync netconf transport stuff) where we send an extra return or something and it upsets the server, causing it to close the connection.
2) I just remembered this issue from a long time ago that I dont think there was ever a "real" resolution for.... In that issue here is the salient point: recalls that some platforms close a connection after a single command is sent via a pty...
IF number two is the issue, then I would expect paramiko/ssh2 transport to work. Would it be possible to re-try one or both of those? I feel like last time we tried that we had a dns/name resolution issue in ssh2, but perhaps it works now (after the async stuff started working) and/or paramiko may work?
I will look a bit more at the sync transport to compare it to the async base transport and let ya know if I come up with anything else....
Carl
setsockopt IPV6_TCLASS 8: Operation not permitted
Ah no. This message shows due to some WSL1 networking stack stuff. Don't bother. It's harmless =)
I'll give ssh2 another try tomorrow. And will check paramiko also.
SSH2 It is still gives me
socket.gaierror: [Errno -5] No address associated with hostname
But i suppose IPv6 itself is the problem here - ssh2
doesn't support it?!
If I change my hostname to IP I get:
socket.gaierror: [Errno -9] Address family for hostname not supported
My netbox mgmt is IPv6 only.
Paramiko
Gives me exactly the same gaierrors: No address associated with hostname
and Address family for hostname not supported
for hostname and IPv6 respectively.
BTW I am unpleasantly surprised IPv6 is not supported =(
Well then :)
Ok, the v6 thing was because I have no need to care about it, nobody had asked yet, and system transport is the "main" thing and it would work anyway... but since you asked, just made a push to scrapli "core" in develop branch that should fix the v6 issue for you :) So.... we can maybe/hopefully test ssh2/paramiko now :)
Commit is here if you could test that out w/ ssh2/paramiko that would be awesome!
Thank you!
IPv4 is legacy LOL
So, with the new commit ssh2
AND paramiko
works!
system
transport has something extra to say:
Traceback (most recent call last):
File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/transport/ptyprocess.py", line 395, in read
s = self.fileobj.read1(size)
OSError: [Errno 5] Input/output error
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/transport/systemssh.py", line 497, in read
return self.session.read(read_bytes)
File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/transport/ptyprocess.py", line 400, in read
raise EOFError("End Of File (EOF). Exception style platform.")
EOFError: End Of File (EOF). Exception style platform.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "o_check.py", line 58, in <module>
main()
File "o_check.py", line 49, in main
response = conn.get(filter_=power_rpc, filter_type="subtree")
File "/home/horseinthesky/scripts/scr/.venv/src/scrapli-netconf/scrapli_netconf/driver/driver.py", line 97, in get
raw_response = self.channel.send_input_netconf(response.channel_input)
File "/home/horseinthesky/scripts/scr/.venv/src/scrapli-netconf/scrapli_netconf/channel/channel.py", line 214, in send_input_netconf
raw_result = self._read_until_prompt(output=raw_result)
File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/channel/channel.py", line 124, in _read_until_prompt
output += self._read_chunk()
File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/channel/channel.py", line 49, in _read_chunk
new_output = self.transport.read()
File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/decorators.py", line 313, in requires_open_session_wrapper
return wrapped_func(*args, **kwargs)
File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/decorators.py", line 113, in decorate
return self.multiprocessing_timeout(
File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/decorators.py", line 286, in multiprocessing_timeout
result = future.get(timeout=self.timeout_duration)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/transport/systemssh.py", line 504, in read
raise ScrapliConnectionLost(msg) from exc
scrapli.exceptions.ScrapliConnectionLost: encountered end of file error reading from system transport, typically this means that the device has closed the connection
Haha yeah you are right v4 is old school lol. This was a good excuse to fixup the v6 socket thing :)
So I think we are basically all good at this point with the notable exception of system
transport. I think we may be just out of luck on this. That other issue I linked to basically had no resolution as well. As I understand it, some Huawei boxes when a connection is made in a PTY (as system transport does) just closes the connection after a single command is sent. I can't really confirm/deny that, but it surely fits with the error that we see. I would guess that if you did a "normal" scrapli connection to this device (just ssh and send a show command or something) that we would end up w/ the same result (everything but system transport works).
If you could give that a shot ("normal" ssh command) to confirm that theory that would be cool. If the result is what I expect then I dont know that we can do anything else on this at the moment... or perhaps ever as system transport is 100% reliant on spawning a pty.
Thanks so much for sticking with me on this one!!
Carl
I'll jump into it after the holidays in Russia. Thanks.
Hey, @carlmontanari .
I've checked regular CLI command and all three (ssh2
/paramiko
/system
) work perfectly fine.
I just copied PRIVS from the issue you mentioned and come up with the following script:
import os
import logging
from scrapli.driver.network_driver import PrivilegeLevel
from scrapli.driver.core import IOSXEDriver
logging.basicConfig(filename="scrapli.log", level=logging.DEBUG)
logger = logging.getLogger("scrapli")
lab = '<mysupersecrethostname'
host = lab
def main():
PRIVS = {
"exec": (PrivilegeLevel(r"^[<a-z0-9.\-@()/:]{1,48}[#>$]\s*$", "exec", "", "", "", False, "",)),
"privilege_exec": (
PrivilegeLevel(
r"^[<a-z0-9.\-@()/:]{1,48}[#>$]\s*$",
"privilege_exec", "exec", "disable", "enable", True, "Password:", )),
"configuration": (
PrivilegeLevel(
r"^\[[a-z0-9.\-@/:]{1,32}\]$",
"configuration", "privilege_exec", "quit", "system-view", False, "", )),
}
DEVICE = {
"host": lab,
"auth_username": f"{os.getenv('USER')}",
"auth_private_key": f"{os.getenv('HOME')}/.ssh/id_rsa",
"auth_strict_key": False,
"ssh_config_file": False,
"transport": "system",
# "transport": "paramiko",
"port": 22,
"timeout_socket": 60,
"privilege_levels": PRIVS,
}
conn = IOSXEDriver(**DEVICE)
conn.open()
response = conn.send_command("disp ver")
print(response.result)
# close the session
conn.close()
if __name__ == "__main__":
main()
Need to underline here that the IDQDD (guy who posted the issue) has Huawei Quidway 5720 box and I have Huawei Cloud Edge box. Since these are completely different lineups developed by completely different teams, there could be no similarities regarding SSH.
One more thing here: ssh2/paramiko execution is ~4 sec where system transport takes 20 sec. Why is that so long? =)
I've checked regular CLI command and all three (ssh2/paramiko/system) work perfectly fine.
This is good news at least. Interesting that system
works here but not netconf though. Another data point for sure, but not sure it gives me any ideas as to why system is broken for netconf still.
Need to underline here that the IDQDD (guy who posted the issue) has Huawei Quidway 5720 box and I have Huawei Cloud Edge box. Since these are completely different lineups developed by completely different teams, there could be no similarities regarding SSH.
Ok fair enough!
One more thing here: ssh2/paramiko execution is ~4 sec where system transport takes 20 sec. Why is that so long? =)
Logs may help show us this, but the most likely culprit is ssh agent trying a bunch of keys before finding one that works. I've also seen things be very slow to connect when I have ssh_config_file
set to True and have a different username provided in python vs what is in the ssh config file -- its usually > 2s slower in the latter case. So some combo of that and/or ssh agent is my guess (probably ssh agent given you have ssh config file set to False). Though 20s vs 4s is quite a big gap...
I've got a fairly significant overhaul to the internals of scrapli "core" about 80% done at this point.... if ssh2/asyncssh/paramiko can get you by for the netconf bits for a while we can try with the new updates in a week or so if that works? There are a few improvements to logging and just some general internal clean up that may help troubleshooting flow a bit easier. Failing that we may have to try to coordinate a zoom/webex/whatever if you're ok with that!
Carl
Hello. I've tried to move my ncclient script to scrapli and faced unexpected result - Auth issue.
It's a Huawei CE box. So it's not in the ist of supported devices. However, here is my code
Here is the traceback:
Successful login usually generates this log message:
My attempt is seen as:
This
UserName=Could not extract user name
really confuses me.Ubuntu 20.04 My env: