scrapli / scrapli_netconf

Fast and flexible Python 3.7+ netconf client specifically for network devices
https://scrapli.github.io/scrapli_netconf/
MIT License
93 stars 6 forks source link

Authentication Timed Out #25

Closed horseinthesky closed 3 years ago

horseinthesky commented 3 years ago

Hello. I've tried to move my ncclient script to scrapli and faced unexpected result - Auth issue.

It's a Huawei CE box. So it's not in the ist of supported devices. However, here is my code

import os
import asyncio
import logging

from scrapli_netconf.driver import AsyncNetconfScrape

logging.basicConfig(filename="scrapli.log", level=logging.INFO)
logger = logging.getLogger("scrapli")

DEVICE = {
    "host": "<my_device_hostname>",
    "auth_username": f"{os.getenv('USER')}",
    "auth_private_key": f"{os.getenv('HOME')}/.ssh/id_rsa",
    "auth_strict_key": False,
    "ssh_config_file": True,
    "transport": "asyncssh",
    "port": 22,
}

power_rpc = '''
      <devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">
        <powerSupplys>
          <powerSupply>
            <position></position>
            <powerEnvironments>
              <powerEnvironment></powerEnvironment>
            </powerEnvironments>
          </powerSupply>
        </powerSupplys>
      </devm>
'''

async def main():
    conn = AsyncNetconfScrape(**DEVICE)
    await conn.open()

    response = await conn.get(filter_=power_rpc, filter_type="subtree")
    print(response.result)

    await conn.close()

if __name__ == "__main__":
    asyncio.get_event_loop().run_until_complete(main())

Here is the traceback:

Traceback (most recent call last):
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_asyncssh/transport/asyncssh_.py", line 253, in _authenticate_private_key
    self.session = await asyncio.wait_for(
  File "/usr/lib/python3.8/asyncio/tasks.py", line 490, in wait_for
    raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "check.py", line 55, in <module>
    asyncio.get_event_loop().run_until_complete(main())
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "check.py", line 37, in main
    await conn.open()
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_netconf/driver/async_driver.py", line 56, in open
    login_bytes = await self.transport.open_netconf()
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_netconf/transport/asyncssh_.py", line 34, in open_netconf
    await self._authenticate()
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_asyncssh/transport/asyncssh_.py", line 218, in _authenticate
    if await self._authenticate_private_key(common_args=common_args):
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_asyncssh/transport/asyncssh_.py", line 266, in _authenticate_private_key
    raise ScrapliTimeout(msg) from exc
scrapli.exceptions.ScrapliTimeout: Private key authentication with host <my_device_hostname> failed. Authentication Timed Out.

Successful login usually generates this log message:

The SSH user succeeded in logging in. (ServiceType=snetconf, UserName=<username>, IPAddress=<source_ip>, VPNInstanceName=MGMT.

My attempt is seen as:

The SSH user failed to login. (ServiceType=**, FailedReason=TCP disconnect from client, UserName=Could not extract user name, IPAddress=<source_ip>, VPNInstanceName=MGMT.)

This UserName=Could not extract user name really confuses me.

Ubuntu 20.04 My env:

pip freeze
asyncssh==2.4.2
cffi==1.14.4
cryptography==3.3
lxml==4.6.2
pycparser==2.20
scrapli==2020.11.15
scrapli-asyncssh==2020.10.10
scrapli-netconf==2020.11.15
six==1.15.0
carlmontanari commented 3 years ago

Hey @horseinthesky thanks for opening this!

Could you post the full log file as well please?

My immediate thought is this is a weird difference: ServiceType=snetconf vs. ServiceType=** -- I'm wondering if the Huawei box wants to connect on port 830 instead of 22? Or maybe just because auth fails it doesn't generate the "service type"... Just wild guessing as I have no idea :)

Full logs maybe will be helpful -- we also may need to enable the asyncssh logging, but before going that far would it be possible for you to try with the other transports -- system and ssh2 or paramiko -- just to see if the same issue happens there.

Thanks!

Carl

horseinthesky commented 3 years ago

@carlmontanari Hey. It's using 22 (I would not see anything in case of 830 cuz of my FW).

Logs are not so informative =(

INFO:scrapli.driver-<mysupersecrethostname>:Non-core transport `asyncssh` selected
INFO:scrapli.helper:found ssh config file at `/home/horseinthesky/.ssh/config`
INFO:scrapli.driver-<mysupersecrethostname>:Opening connection to <mysupersecrethostname>
INFO:asyncssh:Opening SSH connection to <mysupersecretbastionhostname>, port 22
INFO:asyncssh:[conn=0] Connection to <mysupersecretbastionhostname>, port 22 succeeded
INFO:asyncssh:[conn=0]   Local address: <mysupersecretip>, port 65467
INFO:asyncssh:[conn=0] Beginning auth for user horseinthesky
INFO:asyncssh:[conn=0] Auth for user horseinthesky succeeded
INFO:asyncssh:[conn=0] Opening SSH connection to <mysupersecretproxyhostname>, port 22 via bastion
INFO:asyncssh:[conn=0] Opening direct TCP connection to <mysupersecretproxyhostname>, port 22
INFO:asyncssh:[conn=0]   Client address: dynamic port
INFO:asyncssh:[conn=1] Connection to <mysupersecretproxyhostname>, port 22 succeeded
INFO:asyncssh:[conn=1]   Local address: <mysupersecretip>, port 65467
INFO:asyncssh:[conn=1] Beginning auth for user horseinthesky
INFO:asyncssh:[conn=1] Auth for user horseinthesky succeeded
INFO:asyncssh:[conn=1] Opening SSH connection to <mysupersecrethostname>, port 22 via csas
INFO:asyncssh:[conn=1] Opening direct TCP connection to <mysupersecrethostname>, port 22
INFO:asyncssh:[conn=1]   Client address: dynamic port
INFO:asyncssh:[conn=2] Connection to <mysupersecrethostname>, port 22 succeeded
INFO:asyncssh:[conn=2]   Local address: <mysupersecretip>, port 65467
ERROR:scrapli.transport-<mysupersecrethostname>:Private key authentication with host <mysupersecrethostname> failed. Authentication Timed Out.
Traceback (most recent call last):
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_asyncssh/transport/asyncssh_.py", line 253, in _authenticate_private_key
    self.session = await asyncio.wait_for(
  File "/usr/lib/python3.8/asyncio/tasks.py", line 490, in wait_for
    raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError

I'm not sure why but it is using my SSH config file regardless "ssh_config_file": True/False.

carlmontanari commented 3 years ago

Gotcha, figured it was worth a shot to throw that out there :)

I'm not sure why but it is using my SSH config file regardless "ssh_config_file": True/False.

^ this is because asyncssh does this natively and I haven't gotten around to making it not do that heh 🙃

If you could give system/ssh2 a shot and also if possible testing with a password instead just to help try to narrow things down that would be super cool.

Thanks for helping work through this!!

Carl

horseinthesky commented 3 years ago

Ok.

So async version with password has pretty much the same exception but mentions that password auth was used:

ERROR:scrapli.transport-<mysupersecrethostname>:Password authentication with host <mysupersecrethostname> failed. Authentication Timed Out.
Traceback (most recent call last):
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_asyncssh/transport/asyncssh_.py", line 294, in _authenticate_password
    self.session = await asyncio.wait_for(
  File "/usr/lib/python3.8/asyncio/tasks.py", line 490, in wait_for
    raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError
horseinthesky commented 3 years ago

Sync version with system transport is able to connect but crashes on some vendor restrictions:

Traceback (most recent call last):
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/transport/ptyprocess.py", line 395, in read
    s = self.fileobj.read1(size)
OSError: [Errno 5] Input/output error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "o_check.py", line 55, in <module>
    main()
  File "o_check.py", line 43, in main
    response = conn.get(filter_=power_rpc, filter_type="subtree")
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_netconf/driver/driver.py", line 97, in get
    raw_response = self.channel.send_input_netconf(response.channel_input)
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_netconf/channel/channel.py", line 216, in send_input_netconf
    raw_result = self._read_until_prompt(output=raw_result)
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/channel/channel.py", line 124, in _read_until_prompt
    output += self._read_chunk()
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/channel/channel.py", line 49, in _read_chunk
    new_output = self.transport.read()
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/decorators.py", line 313, in requires_open_session_wrapper
    return wrapped_func(*args, **kwargs)
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/decorators.py", line 113, in decorate
    return self.multiprocessing_timeout(
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/decorators.py", line 286, in multiprocessing_timeout
    result = future.get(timeout=self.timeout_duration)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/transport/systemssh.py", line 517, in read
    return self.session.read(read_bytes)
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/transport/ptyprocess.py", line 400, in read
    raise EOFError("End Of File (EOF). Exception style platform.")
EOFError: End Of File (EOF). Exception style platform.

Log ends with:

INFO:scrapli.driver-<mysupersecrethostname>:Connection to <mysupersecrethostname> opened successfully
INFO:scrapli.channel-<mysupersecrethostname>:Attempting to send input: #451
<?xml version='1.0' encoding='utf-8'?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">
        <powerSupplys>
          <powerSupply>
            <position/>
            <powerEnvironments>
              <powerEnvironment/>
            </powerEnvironments>
          </powerSupply>
        </powerSupplys>
      </devm></filter></get></rpc>
##; strip_prompt: False
INFO:scrapli.channel-<mysupersecrethostname>:Read: b'\n#451\n<?xml version=\'1.0\' encoding=\'utf-8\'?>\n<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">\n        <powerSupplys>\n          <powerSupply>\n            <position/>\n            <powerEnvironments>\n              <powerEnvironment/>\n            </powerEnvironments>\n          </powerSupply>\n        </powerSupplys>\n      </devm></filter></get></rpc>\n##'

P.S. Sync version with ssh2 failes with:

Traceback (most recent call last):
  File "o_check.py", line 55, in <module>
    main()
  File "o_check.py", line 37, in main
    conn.open()
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_netconf/driver/driver.py", line 64, in open
    login_bytes = self.transport.open_netconf()
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_netconf/transport/cssh2.py", line 26, in open_netconf
    super().open()
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli_ssh2/transport/cssh2.py", line 136, in open
    self.socket.socket_open()
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/transport/socket.py", line 101, in socket_open
    self.sock.connect((self.host, self.port))
socket.gaierror: [Errno -5] No address associated with hostname
carlmontanari commented 3 years ago

some vendor restrictions <- what does this mean?

EOFError: End Of File (EOF). Exception style platform. seems like the device is just punting us out, perhaps inline with the above vendor restriction comment?

The ssh2 issue seems like it just cant resolve the name... guess we can just ignore that for now anyway though... one thing at a time!

Could you connect to this device manually in a terminal and snag all the output? That could be our best bet to figure out what's going on. I've got these notes to connect and run commands as scrapli does. In theory the get below should work as its just openconfig model but obviously that may not be true for your platform -- but you could of course replace that w/ whatever you want!

Open

ssh 172.18.0.13 -p 22 -o ConnectTimeout=5 -o ServerAliveInterval=10 -l vrnetlab -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -F /dev/null -s netconf

Capabilities Exchange

<?xml version="1.0" encoding="utf-8"?>
    <hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
        <capabilities>
            <capability>urn:ietf:params:netconf:base:1.1</capability>
        </capabilities>
</hello>]]>]]>

Get Subtree

#396
<?xml version='1.0' encoding='utf-8'?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101">
    <get>
        <filter type="subtree">
            <components xmlns="http://openconfig.net/yang/platform">
                <component>
                    <state>
                    </state>
                </component>
            </components>
        </filter>
    </get>
</rpc>
##

EDIT PS: the chunk size is assuming spaces not tabs so if you copy/paste out of here just be aware that if your editor or terminal or whatever decides to use tabs not spaces the chunk size will be wonky!

horseinthesky commented 3 years ago

some vendor restrictions <- what does this mean?

Sorry for this. I mean you have "Supported platforms" section in the README:

Cisco IOS-XE (tested on: 16.12.03) with Netconf 1.0 and 1.1
Cisco IOS-XR (tested on: 6.5.3) with Netconf 1.1
Juniper JunOS (tested on: 17.3R2.10) with Netconf 1.0

thus I thought this message

EOFError: End Of File (EOF). Exception style platform.

was due to some parsing exception from scrapli-netconf.

It (CE box) is actually can't do openconfig (It says it can but it is lying =))

Here is manual NETCONF communication: https://justpaste.it/40y8o

And here is ncclient communication: https://justpaste.it/6r15p

carlmontanari commented 3 years ago

Ah makes sense!

Perfect, thank you so much! I will take a peak this weekend and hopefully figure something out! Thanks for sticking with this!

Carl

carlmontanari commented 3 years ago

Just a bump to say I haven't forgotten about this... this weekend did not go as planned haha, hopefully after work this week or next weekend I will be able not dig into this in more detail!

carlmontanari commented 3 years ago

Hey @horseinthesky sorry for the big delay here.

I'm not seeing anything super obvious as to why this wouldn't work with huawei and of course I have nothing to test with which is a bummer! I did just make a push to develop earlier today that fixed some issues that cropped up w/ system transport.... while I think this is unrelated, it may be worth giving develop a shot to see if it maybe gives us any new info to work with.

I've got a usg6k image I will try to get booted up (I feel like I started to try this before and it did not go well, but we'll see!) and maybe that supports netconf so I can try it out (I have no idea if it does).

Let us know if the develop branch changes anything, and I'll let ya know if I am able to get a test box up and running.

Carl

horseinthesky commented 3 years ago

Thank you. If I can check/test something on CE88XX/68XX Huawei boxes which will help just ask since I don't quite understand how scrapli works :P

Btw I wrote a simple wrapper around asyncssh to be able to work with NETCONF messages and had no Auth issue in this case.

carlmontanari commented 3 years ago

Going to try to make some clean up in scrapli_asyncssh transport today to address the issue where it always uses the private key/config file if it exists and stuff like that -- I think this is the main issue here. Is this the script you were having success w/ with the huawei boxes? If so I will compare to make sure I'm not doing something stupid somewhere :)

In the meantime if you could try w/ develop branch and system transport to see if we get any further than last time that would be really helpful. Thanks a bunch for all the help!!

Carl

horseinthesky commented 3 years ago

Is this the script you were having success w/ with the huawei boxes?

Yes. This is something I suppose should work =)

In the meantime if you could try w/ develop branch and system transport to see if we get any further than last time that would be really helpful.

Could you pls say where should I look to make it work?

carlmontanari commented 3 years ago

Ah sorry -- you can install the develop branch like so: pip install -e git+https://github.com/scrapli/scrapli_netconf.git@develop#egg=scrapli_netconf

I'm not too hopeful that will fix things but figure it is worth a shot :)

OH.... I am apparently blind until just now. In the manual communication link you shared (https://justpaste.it/40y8o) it looks like we are using netconf 1.0 not netconf 1.1.... there is a netconf 1.1 capability in the servers listed capabilities though... so scrapli is definitely using netconf 1.1 encoding which will definitely not work... seems like that may be our big problem. I'm sorry I am just now noticing this... that was a big miss!

You could test to see if this is the issue by doing something like this:

import logging
from scrapli_netconf.driver import NetconfScrape
from scrapli_netconf.constants import NetconfVersion

logging.basicConfig(filename="scrapli.log", level=logging.DEBUG)
logger = logging.getLogger("scrapli")

IOSXR_DEVICE = {
    "host": "localhost",
    "auth_username": "vrnetlab",
    "auth_password": "VR-netlab9",
    "auth_strict_key": False,
    "port": 23830,
    "transport": "system"
}

conn = NetconfScrape(**IOSXR_DEVICE)
conn.open()
conn.netconf_version = NetconfVersion.VERSION_1_0
result = conn.get_config()

The "hello" stuff should just work because it always uses the ]]>]]> delimiter -- but after the hello is exchanged we send either 1.0 or 1.1 encoded messages -- if the 1.1 capability shows up we always use 1.1 encoding... but based on the output you shared this may be a bad idea :) In the above snippet we import the enum I use to set for the netconf version we use, and we set the version to netconf 1.0 prior to running any rpcs (but after open/hello is done).... that might get us working.

Carl

horseinthesky commented 3 years ago

I always use 1.0 mostly because it is quite simple to catch and parse =) Will try to look deeper.

carlmontanari commented 3 years ago

Hah yep, I understand that feeling! Yeah if you can test that develop branch and then setting the version like in the above snippet I think we may be able to get somewhere.

If that does work I can maybe add an attribute like prefer_1_0 or something like that so we can prefer to send 1.0 messages instead of 1.1!

Thanks again for all the help with this!

horseinthesky commented 3 years ago

Hm. It's very strange but async version (the one in my first message) works now :P No changes made. It could be my company's bastion host issue but I'm not sure since I'm not responsible for it.

Now about sync version with system transport. It seems like it ignores my NetconfVersion.VERSION_1_0 setting:

import os
import logging

from scrapli_netconf.driver import NetconfScrape
from scrapli_netconf.constants import NetconfVersion

logging.basicConfig(filename="scrapli.log", level=logging.INFO)
logger = logging.getLogger("scrapli")

lab = '<mysupersecrethostname>'
host = lab

DEVICE = {
    "host": lab,
    "auth_username": f"{os.getenv('USER')}",
    "auth_private_key": f"{os.getenv('HOME')}/.ssh/id_rsa",
    "auth_strict_key": False,
    "ssh_config_file": False,
    "transport": "system",
    "port": 22,
}

power_rpc = '''
      <devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">
        <powerSupplys>
          <powerSupply>
            <position></position>
            <powerEnvironments>
              <powerEnvironment></powerEnvironment>
            </powerEnvironments>
          </powerSupply>
        </powerSupplys>
      </devm>
'''

def main():
    # create scrapli_netconf connection just like with scrapli, open the connection
    conn = NetconfScrape(**DEVICE)
    conn.open()
    conn.netconf_version = NetconfVersion.VERSION_1_0

    response = conn.get(filter_=power_rpc, filter_type="subtree")
    print(response.result)

    # close the session
    conn.close()

if __name__ == "__main__":
    main()

Stil fails with:

Traceback (most recent call last):
  File "o_check.py", line 54, in <module>
    main()
  File "o_check.py", line 46, in main
    response = conn.get(filter_=power_rpc, filter_type="subtree")
  File "/home/horseinthesky/scripts/scr/.venv/src/scrapli-netconf/scrapli_netconf/driver/driver.py", line 97, in get
    raw_response = self.channel.send_input_netconf(response.channel_input)
  File "/home/horseinthesky/scripts/scr/.venv/src/scrapli-netconf/scrapli_netconf/channel/channel.py", line 214, in send_input_netconf
    raw_result = self._read_until_prompt(output=raw_result)
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/channel/channel.py", line 124, in _read_until_prompt
    output += self._read_chunk()
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/channel/channel.py", line 49, in _read_chunk
    new_output = self.transport.read()
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/decorators.py", line 313, in requires_open_session_wrapper
    return wrapped_func(*args, **kwargs)
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/decorators.py", line 113, in decorate
    return self.multiprocessing_timeout(
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/decorators.py", line 286, in multiprocessing_timeout
    result = future.get(timeout=self.timeout_duration)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/transport/systemssh.py", line 517, in read
    return self.session.read(read_bytes)
  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/transport/ptyprocess.py", line 400, in read
    raise EOFError("End Of File (EOF). Exception style platform.")
EOFError: End Of File (EOF). Exception style platform.

Last message of the log (if I understand it correctly it shows the last message on the channel which is what it sent to the device) show 1.1 notation:

INFO:scrapli.channel-<mysupersecrethostname>:Sending client capabilities
INFO:scrapli.channel-<mysupersecrethostname>:Read: b'\n<?xml version="1.0" encoding="utf-8"?>\n    <hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">\n        <capabilities>\n            <capability>urn:ietf:params:netconf:base:1.1</capability>\n        </capabilities>\n</hello>]]>]]>'
INFO:scrapli.driver-<mysupersecrethostname>:Connection to <mysupersecrethostname> opened successfully
INFO:scrapli.channel-<mysupersecrethostname>:Attempting to send input: #458
<?xml version='1.0' encoding='utf-8'?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">
        <powerSupplys>
          <powerSupply>
            <position/>
            <powerEnvironments>
              <powerEnvironment/>
            </powerEnvironments>
          </powerSupply>
        </powerSupplys>
      </devm></filter></get></rpc>
]]>]]>
##; strip_prompt: False
INFO:scrapli.channel-<mysupersecrethostname>:Read: b'\n#458\n<?xml version=\'1.0\' encoding=\'utf-8\'?>\n<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">\n        <powerSupplys>\n          <powerSupply>\n            <position/>\n            <powerEnvironments>\n              <powerEnvironment/>\n            </powerEnvironments>\n          </powerSupply>\n        </powerSupplys>\n      </devm></filter></get></rpc>\n]]>]]>\n##'

I've checked line 400 in scrapli/transport/ptyprocess.py but have no idea what is going on there =)

carlmontanari commented 3 years ago

Ah!

Ok well if the first thing worked (without manually setting the netconf version manually or anything) I wonder if the system transport one will work w/out setting the transport?

It looks like I lied to you also, sorry! Will need to set the version like:

conn.netconf_version = NetconfVersion.VERSION_1_0
conn.channel.netconf_version = NetconfVersion.VERSION_1_0

^ it needs to get set in the driver and the channel for some reason. I should probably make that better at some point :p

ptyprocess stuff is a bit of dark magic vendor'd and tidied up from ptyprocess -- the EOF just means that the device doesn't like what we sent and closed the connection on us... I should also make that exception more clear in scrapli core :)

So I guess we have to try two things now:

1) trying system transport without setting the version to see if that works now (since the async one works I would think this would work too 2) if 1 does not work, can try setting the netconf version in both the driver and the channel (sorry again for not getting that to you correctly before!)

Feels like we are getting closer to resolution :D

Carl

horseinthesky commented 3 years ago

Regarding async version I wonder why it is using 1.1 and it is successful?

INFO:scrapli.channel-<mysupersecrethostname>:Attempting to send input: #451
<?xml version='1.0' encoding='utf-8'?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">
        <powerSupplys>
          <powerSupply>
            <position/>
            <powerEnvironments>
              <powerEnvironment/>
            </powerEnvironments>
          </powerSupply>
        </powerSupplys>
      </devm></filter></get></rpc>
##; strip_prompt: False
INFO:scrapli.channel-<mysupersecrethostname>:Read: b'\n#1381\n<?xml version="1.0" encoding="UTF-8"?>\n<rpc-reply message-id="101" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">\n  <data>\n    <devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">\n      <powerSupplys>\n        <powerSupply>\n          <position>1/3</position>\n          <entIndex>16847872</entIndex>\n          <powerEnvironments>\n            <powerEnvironment>\n              <pemIndex>16847872</pemIndex>\n              <state>supply</state>\n              <voltageValue>12.2</voltageValue>\n              <electricalValue>5.1</electricalValue>\n              <temperValue>N/A</temperValue>\n              <actualPower>62</actualPower>\n              <ratedPower>600</ratedPower>\n            </powerEnvironment>\n          </powerEnvironments>\n        </powerSupply>\n        <powerSupply>\n          <position>1/4</position>\n          <entIndex>16848128</entIndex>\n          <powerEnvironments>\n            <powerEnvironment>\n              <pemIndex>16848128</pemIndex>\n              <state>supply</state>\n              <voltageValue>12.2</voltageValue>\n              <electricalValue>6.5</electricalValue>\n              <temperValue>N/A</temperValue>\n              <actualPower>79</actualPower>\n              <ratedPower>600</ratedPower>\n            </powerEnvironment>\n          </powerEnvironments>\n        </powerSupply>\n      </powerSupplys>\n    </devm>\n  </data>\n</rpc-reply>\n##\n'
INFO:scrapli.driver-<mysupersecrethostname>:Closing connection to <mysupersecrethostname>
INFO:asyncssh:[conn=2] Closing connection

And btw how I can change timeout? It sometimes gets:

scrapli.exceptions.ScrapliTimeout: Private key authentication with host <mysupersecrethostname> failed. Authentication Timed Out.

If I use

conn.netconf_version = NetconfVersion.VERSION_1_0
conn.channel.netconf_version = NetconfVersion.VERSION_1_0

it stucks (Ctrl+C is the only solution). Logs:

INFO:scrapli.driver-<mysupersecrethostname>:Connection to <mysupersecrethostname> opened successfully
INFO:scrapli.channel-<mysupersecrethostname>:Attempting to send input: <?xml version='1.0' encoding='utf-8'?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">
        <powerSupplys>
          <powerSupply>
            <position/>
            <powerEnvironments>
              <powerEnvironment/>
            </powerEnvironments>
          </powerSupply>
        </powerSupplys>
      </devm></filter></get></rpc>
]]>]]>; strip_prompt: False
INFO:asyncssh:[conn=2, chan=0] Received channel close
INFO:asyncssh:[conn=2, chan=0] Channel closed
INFO:asyncssh:[conn=1, chan=0] Aborting channel
INFO:asyncssh:[conn=2] Connection lost
INFO:asyncssh:[conn=1] Closing connection
INFO:asyncssh:[conn=1, chan=0] Closing channel
INFO:asyncssh:[conn=1] Sending disconnect: Disconnected by application (11)
INFO:asyncssh:[conn=0, chan=0] Aborting channel
INFO:asyncssh:[conn=1] Connection closed
INFO:asyncssh:[conn=1, chan=0] Closing channel due to connection close
INFO:asyncssh:[conn=1, chan=0] Channel closed
INFO:asyncssh:[conn=0] Closing connection
INFO:asyncssh:[conn=0, chan=0] Closing channel
INFO:asyncssh:[conn=0] Sending disconnect: Disconnected by application (11)
INFO:asyncssh:[conn=0] Connection closed
INFO:asyncssh:[conn=0, chan=0] Closing channel due to connection close
INFO:asyncssh:[conn=0, chan=0] Channel closed
horseinthesky commented 3 years ago

Sync version with:

conn.netconf_version = NetconfVersion.VERSION_1_0
conn.channel.netconf_version = NetconfVersion.VERSION_1_0

seems to send 1.0 tags

INFO:scrapli.driver-<mysupersecrethostname>:Connection to <mysupersecrethostname> opened successfully
INFO:scrapli.channel-<mysupersecrethostname>:Attempting to send input: <?xml version='1.0' encoding='utf-8'?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">
        <powerSupplys>
          <powerSupply>
            <position/>
            <powerEnvironments>
              <powerEnvironment/>
            </powerEnvironments>
          </powerSupply>
        </powerSupplys>
      </devm></filter></get></rpc>
]]>]]>; strip_prompt: False
INFO:scrapli.channel-<mysupersecrethostname>:Read: b'\n<?xml version=\'1.0\' encoding=\'utf-8\'?>\n<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">\n        <powerSupplys>\n          <powerSupply>\n            <position/>\n            <powerEnvironments>\n              <powerEnvironment/>\n            </powerEnvironments>\n          </powerSupply>\n        </powerSupplys>\n      </devm></filter></get></rpc>\n]]>]]>'

but still gets:

EOFError: End Of File (EOF). Exception style platform.
carlmontanari commented 3 years ago

And btw how I can change timeout? It sometimes gets:

^ you can change the timeout_socket value -- generally across the scrapli transports this value is used for the literal socket that underpins things like paramiko/ssh2, or is used for the "initial connection" type timeout for things like asyncssh/system where there is no direct socket we have control over.

Regarding the 1.0 vs 1.1 thing -- actually I think these failures make sense, and it makes sense why it fails if we change the version after open... during the open phase we pick what capabilities we send in our hello based on the capabilities advertised by the device. So if they advertise 1.1 support, we always send the 1.1 hello. You can see the hello options here:

class NetconfClientCapabilities(Enum):
    CAPABILITIES_1_0 = """
<?xml version="1.0" encoding="utf-8"?>
    <hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
        <capabilities>
            <capability>urn:ietf:params:netconf:base:1.0</capability>
        </capabilities>
</hello>]]>]]>"""
    CAPABILITIES_1_1 = """
<?xml version="1.0" encoding="utf-8"?>
    <hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
        <capabilities>
            <capability>urn:ietf:params:netconf:base:1.1</capability>
        </capabilities>
</hello>]]>]]>"""

So that EOF error (which will be more helpful in the next release of scrapli!) seems "right" because we told the device we wanted to use 1.1 then we sent a 1.0 message and it blew up.

If you want to use 1.0 payloads, is there a different port/ip that you can connect to? For example, with IOSXE I connect on port 22 and it is 1.0 style, but if I connect on port 830 it is 1.1 style. Not sure if that will fix things for you?

To sum up:

Thanks a bunch!

Carl

horseinthesky commented 3 years ago

1) Where should I use this timeout_socket? 2) It works with asyncssh out of the box. And putting netconf_version breaks it (probably due to the fact we already sent 1.1 capability in our hello). system doesn't work in any scenario. 3) In general I think it's a good idea to have 1.0 option. What I cannot understand is why it (sync version with system transport) doesn't work by default when we send 1.1 hello.

carlmontanari commented 3 years ago
    "host": "<my_device_hostname>",
    "auth_username": f"{os.getenv('USER')}",
    "auth_private_key": f"{os.getenv('HOME')}/.ssh/id_rsa",
    "auth_strict_key": False,
    "ssh_config_file": True,
    "transport": "asyncssh",
    "port": 22,
    "timeout_socket": 60
}

^ timeout can be configured int he constructor.

Were you able to try system transport with the develop branch? I am wondering if this commit will help at all... basically the overall handling of things should be almost identical between asyncssh and system so I am def confused as well why system is having a hard time!!

I will try to get a prefer_1_0 option built today/tomrorow to develop branch so you can test it out :D

horseinthesky commented 3 years ago

I've just checked I have mentioned commit but no luck for now.

I have an idea why it doesn't work (Huawei lying again =)). This is the log with putting 1.0 version - we send 1.1 hello.

INFO:scrapli.channel-<mysupersecrethostname>:Sending client capabilities
INFO:scrapli.channel-<mysupersecrethostname>:Read: b'\n<?xml version="1.0" encoding="utf-8"?>\n    <hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">\n        <capabilities>\n            <capability>urn:ietf:params:netconf:base:1.1</capability>\n        </capabilities>\n</hello>]]>]]>'
INFO:scrapli.driver-<mysupersecrethostname>:Connection to <mysupersecrethostname> opened successfully
INFO:scrapli.channel-<mysupersecrethostname>:Attempting to send input: <?xml version='1.0' encoding='utf-8'?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">
        <powerSupplys>
          <powerSupply>
            <position/>
            <powerEnvironments>
              <powerEnvironment/>
            </powerEnvironments>
          </powerSupply>
        </powerSupplys>
      </devm></filter></get></rpc>
]]>]]>; strip_prompt: False
INFO:scrapli.channel-<mysupersecrethostname>:Read: b'\n<?xml version=\'1.0\' encoding=\'utf-8\'?>\n<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get><filter type="subtree"><devm xmlns="http://www.huawei.com/netconf/vrp/huawei-devm">\n        <powerSupplys>\n          <powerSupply>\n            <position/>\n            <powerEnvironments>\n              <powerEnvironment/>\n            </powerEnvironments>\n          </powerSupply>\n        </powerSupplys>\n      </devm></filter></get></rpc>\n]]>]]>'

But it can probably be that Huawei just can't parse it (I've never tried 1.1 on it).

With asyncssh (which works) I cannot check what capability we;'ve sent. Log has just:

INFO:scrapli.channel-lab-myt-1ct5.netinfra.cloud.yandex.net:Sending client capabilities

I've we heva 1.0 here I believe this may be a proof Huawei just doesn't work with 1.1 at all.

carlmontanari commented 3 years ago

Ok, sorry for the delay again! Can we do a bit of a reset here to make sure we are all on the same page? I think I have gone down a few unrelated rabbit holes that has not helped things :)

So, can we just get (w/ full logs pretty please!) a get_config (or get whatever) for asyncssh and system transport with no changes to the version or anything. I want to compare those logs and see if/where system is messing up. Given that the normal async script worked no problem now I am wondering if we have just been getting wrapped around the axel on things and missing the real issue on the sync bits.

Sorry this has been hard for me to follow over the holidays and me getting side tracked on things!! Thanks a bunch for your patience!

Carl

horseinthesky commented 3 years ago

It is no problem to work slow on this one. I'm just glad to help to improve such an amazing tool.

So these are full logs from successful async request (small get RPC to get power supply): https://justpaste.it/9rxjj

These are logs from sync (system transport) request (same RPC): https://justpaste.it/5raml

And these are logs from sync (system transport) WITH putting:

    conn.netconf_version = NetconfVersion.VERSION_1_0
    conn.channel.netconf_version = NetconfVersion.VERSION_1_0

https://justpaste.it/7np61

carlmontanari commented 3 years ago

Boom! You rock, will dig into this today again I hope :D

horseinthesky commented 3 years ago

My guess is: Second scenario - sync (system transport) doesn't work with 1.1 because Huawei can't work with 1.1. But I may be wrong cuz I don't know what

  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/transport/ptyprocess.py", line 400, in read
    raise EOFError("End Of File (EOF). Exception style platform.")
EOFError: End Of File (EOF). Exception style platform.

is about :P

3rd scenario is probably wrong cuz we send the only 1.1 capability and then send RPC with 1.0 wrapping. But it also has this

  File "/home/horseinthesky/scripts/scr/.venv/lib/python3.8/site-packages/scrapli/transport/ptyprocess.py", line 400, in read
    raise EOFError("End Of File (EOF). Exception style platform.")
EOFError: End Of File (EOF). Exception style platform.

so I really don't know =)

carlmontanari commented 3 years ago

I've just pushed a change to develop that maybe? will help. I swear NETCONF drives me mad :) Anyway, the change basically removes newlines and "flattens" the payload we send to the device. In my experience it seems some devices dont mind having "pretty" xml sent to the device, but others very much do not like the extra line breaks for some reason. I'm not sure that is the "fix" here, but this is good as I had missed the "get" operation flattening.

I would be curious if, before you upgrade to the latest develop branch if you just ran a "get_config" instead -- does that work? (the input for that should already be flattened and such). If that doesn't work the develop branch probably won't fix this, but at least we have some movement!

Lastly, would it be possible to get those three log files with logging set to DEBUG? DEBUG should catch the hello sent in both sync and async version?

horseinthesky commented 3 years ago

So, this is DEBUG for SYNC get-config (before "flattening" commit): https://justpaste.it/8kpty

New commit didn't change anything.

Here is successful debug level ASYNC request log: https://justpaste.it/3b9po

Here is SYNC get debug level log: https://justpaste.it/7sjl4

And finally same SYNC but with:

conn.netconf_version = NetconfVersion.VERSION_1_0
conn.channel.netconf_version = NetconfVersion.VERSION_1_0

https://justpaste.it/2gsux

carlmontanari commented 3 years ago

Thanks!

So we get and parse the capabilities and things are the same for sync and async, so I think we are "good" on that front. I think we can ignore the netconf version rabbit hole we started down at this point.

On the sync version I see this in the log that looks "bad" but I don't think its really doing anything to us as we are still getting capabilities and stuff, so maybe we can ignore it?

setsockopt IPV6_TCLASS 8: Operation not permitted

After that/parsing capabilities we send identical payloads to the device w/ sync and async, and we send the return at the same point and everything. So I am thinking we have one of two problems:

1) There is something buggered in the system transport (or perhaps greater sync netconf transport stuff) where we send an extra return or something and it upsets the server, causing it to close the connection. 2) I just remembered this issue from a long time ago that I dont think there was ever a "real" resolution for.... In that issue here is the salient point: recalls that some platforms close a connection after a single command is sent via a pty...

IF number two is the issue, then I would expect paramiko/ssh2 transport to work. Would it be possible to re-try one or both of those? I feel like last time we tried that we had a dns/name resolution issue in ssh2, but perhaps it works now (after the async stuff started working) and/or paramiko may work?

I will look a bit more at the sync transport to compare it to the async base transport and let ya know if I come up with anything else....

Carl

horseinthesky commented 3 years ago
setsockopt IPV6_TCLASS 8: Operation not permitted

Ah no. This message shows due to some WSL1 networking stack stuff. Don't bother. It's harmless =)

I'll give ssh2 another try tomorrow. And will check paramiko also.

horseinthesky commented 3 years ago

SSH2 It is still gives me

socket.gaierror: [Errno -5] No address associated with hostname

But i suppose IPv6 itself is the problem here - ssh2 doesn't support it?! If I change my hostname to IP I get:

socket.gaierror: [Errno -9] Address family for hostname not supported

My netbox mgmt is IPv6 only.

Paramiko Gives me exactly the same gaierrors: No address associated with hostname and Address family for hostname not supported for hostname and IPv6 respectively.

BTW I am unpleasantly surprised IPv6 is not supported =(

carlmontanari commented 3 years ago

Well then :)

Ok, the v6 thing was because I have no need to care about it, nobody had asked yet, and system transport is the "main" thing and it would work anyway... but since you asked, just made a push to scrapli "core" in develop branch that should fix the v6 issue for you :) So.... we can maybe/hopefully test ssh2/paramiko now :)

Commit is here if you could test that out w/ ssh2/paramiko that would be awesome!

horseinthesky commented 3 years ago

Thank you!

IPv4 is legacy LOL

So, with the new commit ssh2 AND paramiko works!

system transport has something extra to say:

Traceback (most recent call last):
  File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/transport/ptyprocess.py", line 395, in read
    s = self.fileobj.read1(size)
OSError: [Errno 5] Input/output error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/transport/systemssh.py", line 497, in read
    return self.session.read(read_bytes)
  File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/transport/ptyprocess.py", line 400, in read
    raise EOFError("End Of File (EOF). Exception style platform.")
EOFError: End Of File (EOF). Exception style platform.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "o_check.py", line 58, in <module>
    main()
  File "o_check.py", line 49, in main
    response = conn.get(filter_=power_rpc, filter_type="subtree")
  File "/home/horseinthesky/scripts/scr/.venv/src/scrapli-netconf/scrapli_netconf/driver/driver.py", line 97, in get
    raw_response = self.channel.send_input_netconf(response.channel_input)
  File "/home/horseinthesky/scripts/scr/.venv/src/scrapli-netconf/scrapli_netconf/channel/channel.py", line 214, in send_input_netconf
    raw_result = self._read_until_prompt(output=raw_result)
  File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/channel/channel.py", line 124, in _read_until_prompt
    output += self._read_chunk()
  File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/channel/channel.py", line 49, in _read_chunk
    new_output = self.transport.read()
  File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/decorators.py", line 313, in requires_open_session_wrapper
    return wrapped_func(*args, **kwargs)
  File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/decorators.py", line 113, in decorate
    return self.multiprocessing_timeout(
  File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/decorators.py", line 286, in multiprocessing_timeout
    result = future.get(timeout=self.timeout_duration)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/horseinthesky/scripts/scr/.venv/src/scrapli/scrapli/transport/systemssh.py", line 504, in read
    raise ScrapliConnectionLost(msg) from exc
scrapli.exceptions.ScrapliConnectionLost: encountered end of file error reading from system transport, typically this means that the device has closed the connection

Logs https://justpaste.it/3myvb

carlmontanari commented 3 years ago

Haha yeah you are right v4 is old school lol. This was a good excuse to fixup the v6 socket thing :)

So I think we are basically all good at this point with the notable exception of system transport. I think we may be just out of luck on this. That other issue I linked to basically had no resolution as well. As I understand it, some Huawei boxes when a connection is made in a PTY (as system transport does) just closes the connection after a single command is sent. I can't really confirm/deny that, but it surely fits with the error that we see. I would guess that if you did a "normal" scrapli connection to this device (just ssh and send a show command or something) that we would end up w/ the same result (everything but system transport works).

If you could give that a shot ("normal" ssh command) to confirm that theory that would be cool. If the result is what I expect then I dont know that we can do anything else on this at the moment... or perhaps ever as system transport is 100% reliant on spawning a pty.

Thanks so much for sticking with me on this one!!

Carl

horseinthesky commented 3 years ago

I'll jump into it after the holidays in Russia. Thanks.

horseinthesky commented 3 years ago

Hey, @carlmontanari .

I've checked regular CLI command and all three (ssh2/paramiko/system) work perfectly fine.

I just copied PRIVS from the issue you mentioned and come up with the following script:

import os
import logging

from scrapli.driver.network_driver import PrivilegeLevel
from scrapli.driver.core import IOSXEDriver

logging.basicConfig(filename="scrapli.log", level=logging.DEBUG)
logger = logging.getLogger("scrapli")

lab = '<mysupersecrethostname'
host = lab

def main():
    PRIVS = {
        "exec": (PrivilegeLevel(r"^[<a-z0-9.\-@()/:]{1,48}[#>$]\s*$", "exec", "", "", "", False, "",)),
        "privilege_exec": (
            PrivilegeLevel(
                r"^[<a-z0-9.\-@()/:]{1,48}[#>$]\s*$",
                "privilege_exec", "exec", "disable", "enable", True, "Password:", )),
        "configuration": (
            PrivilegeLevel(
                r"^\[[a-z0-9.\-@/:]{1,32}\]$",
                "configuration", "privilege_exec", "quit", "system-view", False, "", )),
    }

    DEVICE = {
        "host": lab,
        "auth_username": f"{os.getenv('USER')}",
        "auth_private_key": f"{os.getenv('HOME')}/.ssh/id_rsa",
        "auth_strict_key": False,
        "ssh_config_file": False,
        "transport": "system",
        # "transport": "paramiko",
        "port": 22,
        "timeout_socket": 60,
        "privilege_levels": PRIVS,
    }

    conn = IOSXEDriver(**DEVICE)
    conn.open()
    response = conn.send_command("disp ver")
    print(response.result)

    # close the session
    conn.close()

if __name__ == "__main__":
    main()

Need to underline here that the IDQDD (guy who posted the issue) has Huawei Quidway 5720 box and I have Huawei Cloud Edge box. Since these are completely different lineups developed by completely different teams, there could be no similarities regarding SSH.

One more thing here: ssh2/paramiko execution is ~4 sec where system transport takes 20 sec. Why is that so long? =)

carlmontanari commented 3 years ago

I've checked regular CLI command and all three (ssh2/paramiko/system) work perfectly fine.

This is good news at least. Interesting that system works here but not netconf though. Another data point for sure, but not sure it gives me any ideas as to why system is broken for netconf still.

Need to underline here that the IDQDD (guy who posted the issue) has Huawei Quidway 5720 box and I have Huawei Cloud Edge box. Since these are completely different lineups developed by completely different teams, there could be no similarities regarding SSH.

Ok fair enough!

One more thing here: ssh2/paramiko execution is ~4 sec where system transport takes 20 sec. Why is that so long? =)

Logs may help show us this, but the most likely culprit is ssh agent trying a bunch of keys before finding one that works. I've also seen things be very slow to connect when I have ssh_config_file set to True and have a different username provided in python vs what is in the ssh config file -- its usually > 2s slower in the latter case. So some combo of that and/or ssh agent is my guess (probably ssh agent given you have ssh config file set to False). Though 20s vs 4s is quite a big gap...

I've got a fairly significant overhaul to the internals of scrapli "core" about 80% done at this point.... if ssh2/asyncssh/paramiko can get you by for the netconf bits for a while we can try with the new updates in a week or so if that works? There are a few improvements to logging and just some general internal clean up that may help troubleshooting flow a bit easier. Failing that we may have to try to coordinate a zoom/webex/whatever if you're ok with that!

Carl