ronf / asyncssh

AsyncSSH is a Python package which provides an asynchronous client and server implementation of the SSHv2 protocol on top of the Python asyncio framework.
Eclipse Public License 2.0
1.56k stars 157 forks source link

SSH connection issue when upgrading to asyncssh 2.15.0 #678

Closed riadhelloumi closed 2 months ago

riadhelloumi commented 3 months ago

Hi,

When upgrading from asyncssh 2.14.2 to 2.15.0, I got this error when I try to connect to an embedded device running OpenSSH server:

"Error during SSH connection: SSH connection closed"

How to activate logging with asyncssh, to have more verbose logs that helps to investigate ?

Best regards, Riadh

ronf commented 3 months ago

You can enable debugging with:

    import logging
    logging.basicConfig(level='DEBUG')

If you want to increase the debug level of AsyncSSH, you can call:

    asyncssh.set_debug_level(2)

The level value can be from 1 (the default) to 3. Level 3 dumps out raw decrypted bytes, though, and those can contain things like passwords in cleartext. So, I recommend getting as much as you can with level 2, where information like that is sanitized. Level 3 is generally only needed if packets are reported as malformed and you need to see their exact content.

Chrisknyfe commented 3 months ago

EDIT: had to redact sensitive strings

I can confirm this issue started reproducing when I upgraded from asyncssh 2.14.2 to asyncssh 2.15.0. I am opening an SSHClientConnection and attempting to download multiple files via scp. Here is a minimal script that reproduces the issue:

import asyncio
import logging
import asyncssh
import argparse
import sys

async def connect_task(ipaddr):
    # EDIT: username and password redacted
    async with asyncssh.connect(host = ipaddr) as conn:
        await conn.run("sudo touch /temp_678.txt")
        await conn.run("sudo chmod a+r /temp_678.txt")
        for i in range(10):
            logging.info(f"iteration {i}")
            await asyncssh.scp((conn, "/temp_678.txt"), ".")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--ip", required=True)
    args = parser.parse_args()
    logging.basicConfig(level='DEBUG', stream=sys.stdout)
    asyncssh.set_debug_level(2)

    asyncio.run(connect_task(args.ip))

see attached log

INFO:root:iteration 1
INFO:asyncssh.sftp:[conn=0] Starting remote SCP, args: -f /temp_678.txt
DEBUG:asyncssh:[conn=0, chan=3] Set write buffer limits: low-water=16384, high-water=65536
INFO:asyncssh:[conn=0, chan=3] Requesting new SSH session
DEBUG:asyncssh:[conn=0, chan=3]   Initial recv window 2097152, packet size 32768
INFO:asyncssh:[conn=0, chan=2] Received exit status 0
DEBUG:asyncssh:[conn=0] Uncaught exception
Traceback (most recent call last):
  File "C:\Users\ZachBernal\AppData\Local\Programs\Python\Python312\Lib\site-packages\asyncssh\connection.py", line 1329, in data_received
    while self._inpbuf and self._recv_handler():
                           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ZachBernal\AppData\Local\Programs\Python\Python312\Lib\site-packages\asyncssh\connection.py", line 1594, in _recv_packet
    processed = handler.process_packet(pkttype, seq, packet)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ZachBernal\AppData\Local\Programs\Python\Python312\Lib\site-packages\asyncssh\packet.py", line 237, in process_packet
    self._packet_handlers[pkttype](self, pkttype, pktid, packet)
  File "C:\Users\ZachBernal\AppData\Local\Programs\Python\Python312\Lib\site-packages\asyncssh\channel.py", line 656, in _process_request
    self._service_next_request()
  File "C:\Users\ZachBernal\AppData\Local\Programs\Python\Python312\Lib\site-packages\asyncssh\channel.py", line 416, in _service_next_request     
    result = cast(Optional[bool], handler(packet))
                                  ^^^^^^^^^^^^^^^
  File "C:\Users\ZachBernal\AppData\Local\Programs\Python\Python312\Lib\site-packages\asyncssh\channel.py", line 1246, in _process_exit_status_request
    self._session.exit_status_received(status)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'exit_status_received'
INFO:asyncssh:[conn=0] Connection failure: 'NoneType' object has no attribute 'exit_status_received'
INFO:asyncssh:[conn=0, chan=2] Closing channel due to connection close
INFO:asyncssh:[conn=0, chan=3] Closing channel due to connection close
INFO:asyncssh:[conn=0, chan=3] Channel closed: 'NoneType' object has no attribute 'exit_status_received'
INFO:asyncssh:[conn=0] Closing connection
INFO:asyncssh:[conn=0, chan=2] Closing channel
INFO:asyncssh:[conn=0] Sending disconnect: Disconnected by application (11)
Traceback (most recent call last):
  File "C:\workspace\labautomation\dev_tests\connection_test_asyncssh_678.py", line 22, in <module>
    asyncio.run(connect_task(args.ip))
  File "C:\Users\ZachBernal\AppData\Local\Programs\Python\Python312\Lib\asyncio\runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "C:\Users\ZachBernal\AppData\Local\Programs\Python\Python312\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ZachBernal\AppData\Local\Programs\Python\Python312\Lib\asyncio\base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "C:\workspace\labautomation\dev_tests\connection_test_asyncssh_678.py", line 13, in connect_task
    await asyncssh.scp((conn, "/temp_678.txt"), ".")
  File "C:\Users\ZachBernal\AppData\Local\Programs\Python\Python312\Lib\site-packages\asyncssh\scp.py", line 1032, in scp
    reader, writer = await _start_remote(
                     ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ZachBernal\AppData\Local\Programs\Python\Python312\Lib\site-packages\asyncssh\scp.py", line 190, in _start_remote
    writer, reader, _ = await conn.open_session(command, encoding=None)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ZachBernal\AppData\Local\Programs\Python\Python312\Lib\site-packages\asyncssh\connection.py", line 4198, in open_session
    chan, session = await self.create_session(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ZachBernal\AppData\Local\Programs\Python\Python312\Lib\site-packages\asyncssh\connection.py", line 4173, in create_session        
    session = await chan.create(session_factory, command, subsystem,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ZachBernal\AppData\Local\Programs\Python\Python312\Lib\site-packages\asyncssh\channel.py", line 1111, in create
    packet = await self._open(b'session')
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ZachBernal\AppData\Local\Programs\Python\Python312\Lib\site-packages\asyncssh\channel.py", line 710, in _open
    return await self._open_waiter
           ^^^^^^^^^^^^^^^^^^^^^^^
asyncssh.misc.ChannelOpenError: SSH connection closed

log.txt

ronf commented 3 months ago

So far, I'm unable to reproduce this -- both 2.14.2 and 2.15.0 seem to work fine against both an OpenSSH scp server and AsyncSSH acting as an scp server in my testing.

Looking at the uncaught exception, it seems to be related to receiving an exit_status message on a channel where the _session member is set to None. This normally happens when cleaning up the session, after receiving something like a close message. There's no sign in the debug log here of a channel close message being received, though.

It would be possible to work around this error by putting in a test for _session being None before calling exit_status_received(), but that really only masks the problem and doesn't explain why you're suddenly seeing this on 2.15.0 but not 2.14.2. Also, there is already code when requests arrive that test if the channel is closing before processing the request, raising a ProtocolError if this happens. So, it shouldn't be possible to process exit_status or other similar messages with self._session set to None.

Looking at differences between 2.14.2 and 2.15.0, there are some differences around close processing, but they mainly deal with handling task cancellation, such as when the user hits Ctrl-C when running an AsyncSSH client. So, I don't think any of that would be relevant here.

Chrisknyfe commented 3 months ago

I'll do some more digging. For reference, the system I'm connecting to uses Dropbear v2020.81 ssh server.

ronf commented 3 months ago

I also tried compiling dropbear 2020.81 here and running against that, but that also worked fine.

I noticed you have "sudo" commands in the example. Is the remote system set up to allow sudo with no password prompt? If it did ask for password, the client code would not cope properly with that, but that would cause a failure earlier than what you're seeing, so I don't think that's the problem here.

seewindcn commented 3 months ago

I found 2.14.2 channel._discard_recv code:

        # If recv is close_pending, we know send is already closed
        if self._recv_state == 'close_pending':
            self._recv_state = 'closed'
            self._loop.call_soon(self._cleanup)

and 2.15.0 channel._discard_recv code:

        # If recv is close_pending, we know send is already closed
        if self._recv_state == 'close_pending':
            self._recv_state = 'closed'

        self._loop.call_soon(self._cleanup)

2.14.2 only call self._cleanup when self._recv_state == 'close_pending' @ronf

ronf commented 3 months ago

Yes - this was an intentional change, as part of commit 220b9d4.

If your application is calling close() or abort() on a channel but then continuing to read from it, that'a a bug in the application. Calling these functions closes the channel completely, for both sending and receiving, and applications aren't support to make any further calls on the channel after either of these methods are called.

If you wish to tell the remote system that you aren't going to be sending any more data but you still wish to receive data, you should call write_eof() on the channel. This puts it into a half-open state, when you are allowed to continue reading. This can be initiated from either side, and the channel should remain half-open until the other side does its own write_eof() or until either side calls close() or abort().

Since you are not directly closing channels here, I wonder if maybe the issue is caused by the connection getting closed before the scp() calls have had a chance to finish. The sample code you provided doesn't look like it would have this problem, but could your real application not be waiting for the scp() tasks to complete before trying to close the connection or exit the event loop?

wilson-conley commented 3 months ago

I am seeing this error as well, and am able to reproduce with minimal code example below:

import asyncio
import asyncssh

async def main():
    async with asyncssh.connect(host="10.136.86.21", username="ansible") as client:
        await asyncssh.scp("my_file", (client, "/tmp/my_file"))
        await client.run("whoami")  # this will error out

if __name__ == "__main__":
    asyncio.run(main())

It seems that the scp when copying files from the local machine to remote machine closes the channel such that anything that attempts to run afterwards errors out. It's interesting that a different error is thrown when the file already exists on the remote host and when it doesn't exist.

Here is the error stack trace raised from the client.run call after SCP when the file doesn't exist on remote host prior to SCP:

Traceback (most recent call last):
  File "/Users/wilsonconley/Desktop/test.py", line 12, in <module>
    asyncio.run(main())
  File "/Users/wilsonconley/.pyenv/versions/3.9.19/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Users/wilsonconley/.pyenv/versions/3.9.19/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/Users/wilsonconley/Desktop/test.py", line 8, in main
    await client.run("whoami")
  File "/Users/wilsonconley/.pyenv/versions/test-asyncssh/lib/python3.9/site-packages/asyncssh/connection.py", line 4495, in run
    process = await self.create_process(*args, **kwargs) # type: ignore
  File "/Users/wilsonconley/.pyenv/versions/test-asyncssh/lib/python3.9/site-packages/asyncssh/connection.py", line 4373, in create_process
    chan, process = await self.create_session(
  File "/Users/wilsonconley/.pyenv/versions/test-asyncssh/lib/python3.9/site-packages/asyncssh/connection.py", line 4266, in create_session
    session = await chan.create(session_factory, command, subsystem,
  File "/Users/wilsonconley/.pyenv/versions/test-asyncssh/lib/python3.9/site-packages/asyncssh/channel.py", line 1111, in create
    packet = await self._open(b'session')
  File "/Users/wilsonconley/.pyenv/versions/test-asyncssh/lib/python3.9/site-packages/asyncssh/channel.py", line 710, in _open
    return await self._open_waiter
asyncssh.misc.ChannelOpenError: SSH connection closed

And the error when file already exists on remote host prior to SCP:

Traceback (most recent call last):
  File "/Users/wilsonconley/Desktop/test.py", line 12, in <module>
    asyncio.run(main())
  File "/Users/wilsonconley/.pyenv/versions/3.9.19/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Users/wilsonconley/.pyenv/versions/3.9.19/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/Users/wilsonconley/Desktop/test.py", line 8, in main
    await client.run("whoami")
  File "/Users/wilsonconley/.pyenv/versions/test-asyncssh/lib/python3.9/site-packages/asyncssh/connection.py", line 4495, in run
    process = await self.create_process(*args, **kwargs) # type: ignore
  File "/Users/wilsonconley/.pyenv/versions/test-asyncssh/lib/python3.9/site-packages/asyncssh/connection.py", line 4373, in create_process
    chan, process = await self.create_session(
  File "/Users/wilsonconley/.pyenv/versions/test-asyncssh/lib/python3.9/site-packages/asyncssh/connection.py", line 4266, in create_session
    session = await chan.create(session_factory, command, subsystem,
  File "/Users/wilsonconley/.pyenv/versions/test-asyncssh/lib/python3.9/site-packages/asyncssh/channel.py", line 1207, in create
    result = await self._make_request(b'exec', String(command))
  File "/Users/wilsonconley/.pyenv/versions/test-asyncssh/lib/python3.9/site-packages/asyncssh/channel.py", line 740, in _make_request
    return await waiter
  File "/Users/wilsonconley/.pyenv/versions/test-asyncssh/lib/python3.9/site-packages/asyncssh/connection.py", line 1333, in data_received
    while self._inpbuf and self._recv_handler():
  File "/Users/wilsonconley/.pyenv/versions/test-asyncssh/lib/python3.9/site-packages/asyncssh/connection.py", line 1598, in _recv_packet
    processed = handler.process_packet(pkttype, seq, packet)
  File "/Users/wilsonconley/.pyenv/versions/test-asyncssh/lib/python3.9/site-packages/asyncssh/packet.py", line 237, in process_packet
    self._packet_handlers[pkttype](self, pkttype, pktid, packet)
  File "/Users/wilsonconley/.pyenv/versions/test-asyncssh/lib/python3.9/site-packages/asyncssh/channel.py", line 656, in _process_request
    self._service_next_request()
  File "/Users/wilsonconley/.pyenv/versions/test-asyncssh/lib/python3.9/site-packages/asyncssh/channel.py", line 416, in _service_next_request
    result = cast(Optional[bool], handler(packet))
  File "/Users/wilsonconley/.pyenv/versions/test-asyncssh/lib/python3.9/site-packages/asyncssh/channel.py", line 1246, in _process_exit_status_request
    self._session.exit_status_received(status)
AttributeError: 'NoneType' object has no attribute 'exit_status_received'

Either way, it seems that the client connection is put in a bad state after an SCP from local to remote. I've tested and the client is not put in a bad state when doing the SCP from remote to local, which is also interesting.

I'm using asyncssh 2.16.0

seewindcn commented 3 months ago

asyncssh.scp, call _SCPSource.run:

                elif dstconn:
                    reader, writer = await _start_remote(
                        dstconn, False, must_be_dir, preserve, recurse, dstpath)

                    source = _SCPSource(local_fs, reader, writer,
                                        preserve, recurse, block_size,
                                        progress_handler, error_handler)

                    await source.run(srcpath)

and _SCPSource.run finally call _SCPSource.close:

            for name in await SFTPGlob(self._fs).match(srcpath):
                await self._send_files(cast(bytes, name.filename),
                                            b'', name.attrs)
        except (OSError, SFTPError) as exc:
            self.handle_error(exc)
        finally:
            await self.close()

_SCPSource.close:

        if self._server:
            cast('SSHServerChannel', self._writer.channel).exit(0)
        else:
            self._writer.close()

        await self._writer.channel.wait_closed()

close function will call self._writer.close(), it close the session first, and then channel receive _process_exit_status_request packet.

ronf commented 3 months ago

Thank you, @wilson-conley -- I was able to reproduce the issue with your example code.

The main issue here seems to be a race between the client explicitly closing a channel and receiving an exit_status message from the remote peer. In particular, scp() will close the channel explicitly when a transfer is complete, and this now triggers a call to _cleanup() which clears out the value self._session before the exit_status message has come in. Any channel requests really should be ignored if the client has already closed a channel. I think something like the following change should help here:

diff --git a/asyncssh/channel.py b/asyncssh/channel.py
index f99f493..6d0c694 100644
--- a/asyncssh/channel.py
+++ b/asyncssh/channel.py
@@ -413,7 +413,11 @@ class SSHChannel(Generic[AnyStr], SSHPacketHandler):
         handler = cast(_RequestHandler, getattr(self, name, None))

         if handler:
-            result = cast(Optional[bool], handler(packet))
+            if self._session:
+                result = cast(Optional[bool], handler(packet))
+            else:
+                # Ignore requests received after application closes the channel
+                result = True
         else:
             self.logger.debug1('Received unknown channel request: %s', request)
             result = False

In prior releases, calling close() or abort() on a channel would only trigger a call to _cleanup() when the remote system had also done a close (by sending MSG_CHANNEL_CLOSE). So, there was no chance for self._session on the channel object to be set to None while remote requests were accepted. Since those were processed asynchronously, I'm thinking there might still have been a chance for a race condition here, but it was less likely than it is in the new code.

I think skipping request callbacks in the session after an application has done a close() or abort() is a good change. Callers might not expect to receive such callbacks after they close a channel. The above code silently acknowledges such requests without calling into the closed session object.

With this change, the uncaught exception of AttributeError: 'NoneType' object has no attribute 'exit_status_received' should no longer be triggered. That's what was causing the connection to be closed, which then led to problems when trying to re-use the connection for additional sessions.

As for the two different errors you saw, I think that was just an issue about whether the connection close was processed enough to know the connection was closed at the time a request to open a new channel was received.

ronf commented 3 months ago

Thanks for the additional tracing, @seewindcn. What you show is consistent with what I saw here, and the fix above should address this. Let me know if this works for you.

wilson-conley commented 3 months ago

@ronf thanks for the quick response - I tested out the change you suggested and it fixed the error for me. I was able to successfully reuse the client connection after performing an scp operation.

ronf commented 3 months ago

Thanks for the confirmation - this fix is now available in the "develop" branch as commit 3698c93.

wilson-conley commented 3 months ago

Awesome, thank you! What is the expected timeframe for the next release?

ronf commented 3 months ago

Since 2.16.0 just came out less than a week ago, I'll probably want to give it some time to see if any other issues appear, but I can aim for a a patch release (2.16.1) which includes this in early September.

wilson-conley commented 3 months ago

Sounds great, thanks!

ronf commented 2 months ago

This fix is now available in AsyncSSH 2.17.0. Sorry for the regression!

wilson-conley commented 2 months ago

Version 2.17.0 fixes the issue for me

ronf commented 2 months ago

Great, thanks for the confirmation!

riadhelloumi commented 2 months ago

Hi, The issue is also fixed at my side with version 2.17.0. Thanks a lot! Best regards, Riadh

ronf commented 2 months ago

My pleasure - thanks for reporting this, @riadhelloumi!