Closed JostLuebbe closed 1 year ago
There's definitely a problem with the SSH implementation on the target system. As you noted, it is sending a MSG_IGNORE packet with a length of 5 in this case, but this ignore packet has no bytes after that length field, triggering the "Incomplete packet" error. Previous MSG_IGNORE packets had a length of 0, and they worked fine. In theory, a length of 5 would also be allowed by the spec, but only if that MSG_IGNORE included an additional 5 bytes of data after this length field.
I noticed that just before this you were sending a secret followed by a newline ('\n') in response to what looks like a password prompt. Could that password be wrong? If so, the target device may be trying to close the connection and it got something wrong in the output it generated when doing that, but perhaps you wouldn't see that if the password was correct. I also wonder if maybe it is expecting a carriage return ('\r') instead of a newline at the end of the password, and perhaps that might be triggering the problem. Could you try changing the '\n' to a '\r' and sees if it makes a difference?
Looking at OpenSSH source, it appears that it might be ignoring any payload data sent in a MSG_IGNORE, not even verifying that the length field in the MSG_IGNORE matches with the number of bytes remaining in the packet. This seems questionable to me, but if you want to see if you can progress any further by ignoring this error, you can try commenting out the following lines in SSHConnection._process_ignore in connection.py:
def _process_ignore(self, _pkttype: int, _pktid: int,
packet: SSHPacket) -> None:
"""Process an ignore message"""
# pylint: disable=no-self-use
_ = packet.get_string() # data <------- COMMENT THIS OUT
packet.check_end() <------- COMMENT THIS OUT
Hi @ronf,
Thanks for the quick response! Commenting out those two lines fixed the issue. We went from getting ~250 devices giving us an "Incomplete Packet" exception to none. We were able to authenticate to all of them and run a "show version" command. I don't think the newline or the password were the issue, because the password works when connecting via a normal SSH connection on the same device, and the same newline works on other devices.
You mentioned this is an implementation problem on the target system, should I open a ticket with Cisco to see if they'll release a fix for it? Or is there some other more permanent fix we could implement?
Thanks again for the awesome support, really appreciate it.
Regarding the newline, I wasn't sure that would be an issue, but I only mentioned it because when you SSH by hand, you would be sending a carriage return when you hit "enter" after providing the password, not a newline. It probably accepts both, but I thought maybe the newline was confusing it and getting you into a code path where it sent the broken MSG_IGNORE.
In terms of a more permanent fix, I was thinking of possibly looking at the server version and selectively turning off the validation of the data argument. This might look something like:
def _process_ignore(self, _pkttype: int, _pktid: int,
packet: SSHPacket) -> None:
"""Process an ignore message"""
# Work around missing payload bytes in an ignore message
# in some Cisco SSH servers
if b'Cisco' not in self._server_version:
_ = packet.get_string() # data
packet.check_end()
That way, this checking is still enabled for most SSH servers, but it could allow you to work around the bug when connecting to a Cisco SSH server. Of course, there may be Cisco SSH servers out there which don't have this problem, but without more data it's hard to know how to narrow that down further (by something like the version number it reports).
Could you give this version a try?
If you wanted to open a ticket with Cisco on this, that'd be great. I've managed to get things like this fixed in some other cases and that might help other SSH implementations to interoperate better with it. Of course, getting all the broken versions already deployed out there updated will mean we'll be living with this for a long time, even if it does get fixed in future releases.
When reporting this, you can reference section 11.2 of RFC 4253, which defines the ignore message as:
byte SSH_MSG_IGNORE
string data
The definition of "string" is in [section 5]() of RFC 4251:
Arbitrary length binary string. Strings are allowed to contain
arbitrary binary data, including null characters and 8-bit
characters. They are stored as a uint32 containing its length
(number of bytes that follow) and zero (= empty string) or more
bytes that are the value of the string. Terminating null
characters are not used.
Strings are also used to store text. In that case, US-ASCII is
used for internal names, and ISO-10646 UTF-8 for text that might
be displayed to the user. The terminating null character SHOULD
NOT normally be stored in the string. For example: the US-ASCII
string "testing" is represented as 00 00 00 07 t e s t i n g. The
UTF-8 mapping does not alter the encoding of US-ASCII characters.
However, in the case of the broken Cisco MSG_IGNORE, the 4-byte length field of the "data" string in the message is "00 00 00 05" (a length of 5 bytes) but it is not followed by actual data of that length. Perhaps the implementation thought the length was supposed to include the 1-byte message identifier (MSG_IGNORE == 2) and the length field itself, as that would total 5 bytes (though that could also just be a coincidence). In any case, that's not how strings in SSH work. The length only counts bytes immediately after a 4-byte length field, and does not include the length itself or any other data in that count. If the packet ends before that number of bytes are present, it is treated as an incomplete packet.
Could you give this version a try?
That version of the fix also works for all of our devices.
If you wanted to open a ticket with Cisco on this, that'd be great.
It seems Cisco has already fixed this issue in later versions of their router/switch code and their (admittedly fair) advice to us would be to upgrade our systems. We would still highly appreciate it if you could support these older versions by including that check in the _process_ignore
function.
However, if you wanted to make that check a little more specific, in testing in our environment it seems the 12.x versions of Cisco IOS were the main issue. I wish I could be even more specific, but Cisco's versioning is unfortunately not very friendly for these sorts of checks.
The SSH version reported by the target in this case is SSH-2.0-Cisco-1.25
, so I'm not entirely sure how to associate that with an IOS version. I'm happy to make this change, but it would probably need to be done for all Cisco devices. Since it only affects packet validation and not anything AsyncSSH sends, though, I think that's probably ok. Were you able to find a reference to a bug report or release notes mentioning the fix that I could add to the comments in the change?
The workaround for this issue is now available in the "develop" branch as commit 16ece51, and will be included in the next release. Thanks again for reporting this and providing such detailed debug information!
No problem! Thanks for being so receptive and putting that fix in for us. Unfortunately I wasn't able to find a specific release note from Cisco mentioning changing or updating their SSH implementation on versions past 12.x.
The fix for this is now available as part of AsyncSSH 2.13.2.
Version/OS Information
Problem Description
We're currently using the scrapli library to automate network device config changes. Scrapli uses the asyncssh library as a transport when connecting to devices in an asynchronous manner. This works really well for the vast majority of the devices in our fleet, but we're having issues with a small subset raising a "ProtocolError: Incomplete Packet" exception when Scrapli tries to enable on them while using the asyncssh transport. This issue does not occur if we use a different transport, such as connecting to the device in a synchronous manner, which is why we think the issue may be with asyncssh, and not Scrapli. We're hoping you might be able to give us more ideas on what we could do to fix this.
Additional Information
We've looked pretty extensively for others who might have had this issue, but the closest we could find is this issue. However, they were not even able to complete a connection to their device, whereas we're able to connect and authenticate using SSH username/password credentials, but when the enable string is sent, then the exception arises. We have tried sending minimal kex_algs/encryption_algs/mac_algs, but that hasn't fixed our issue.
Possible Root Cause
We've noticed that these MSG_IGNOREs of 5 bytes are always sent right before the incomplete packet happens.
Example Code
Debug Output