Key handling is terribad

paramiko / paramiko

The leading native Python SSHv2 protocol library.

http://paramiko.org

GNU Lesser General Public License v2.1

9.03k stars 2k forks source link

Key handling is terribad #387

Open bitprophet opened 10 years ago

bitprophet commented 10 years ago

There are a number of issues stemming from what feels like a simply too-naïve implementation of key handling in SSHClient and PKey.

I am making a single ticket for this because most of the existing PRs poking at it are too limited in scope; this sort of change has a high chance for bugs and breaking backwards compatibility (intentionally or no) and I feel it needs a broadly considered update.

[EDIT: See this comment for an update on the intent/scope of this issue!]

The high level details:

First off, SSHClient._auth uses a multi-exit strategy combined with storing a single exception to raise at the end of the process. This frequently means the raised exception at auth time is flat out incorrect as to the true cause of the inability to authenticate.
- For example, early on it attempts to check over all explicitly given key_filename values...and since it can't reliably know offhand which key types they are, it tries all known key types. In order. This will always generate/store errors unless one happens to have only one key and it's of the arbitrarily-first-in-the-list key type.
- Thus even if all keys given were valid keys of their intended type, and they also all happened to be invalid from the target server's perspective (i.e. the user doesn't actually have an accepted key) we raise to them a nice helpful "Hey your key is fucked up" error instead of "None of your keys were accepted by the server".
Furthermore, PKey._read_private_key - used frequently within the above, indirectly - is similarly messy, and many, many types of key loading/reading problems end up raising an error about how Private key file is encrypted - on unencrypted files.
- Why? Simply because the logic assumes if it gets to the end of the function, and the password kwarg is blank, that a password was required and not given.
- While I haven't debugged this as deeply as I'd like yet, it comes up frequently when trying to deal with ECDSA keys, and/or when lower level key handling (or key related crap in the test suite) is presently botched.

A number of tickets have sprung up on this, I'm sure I haven't gotten all of them, but they include #358 and #224.

Strongly related are a bevvy of tickets concerned with exception handling & exception types/subclasses, such as Fabric #85 and our own #351 and #293.

bitprophet commented 10 years ago

Another good example of this hilarity is a recent test I added that attempts to prove a negative case of multiple key handling; when the test server is configured to only accept ECDSA, and only an RSA key is given to connect(), it (on my Mac localhost) raises a PasswordRequiredException (odd, but a known thing). So far so good, self.testRaises(PasswordRequiredException, ...), done. Right?

Except on Travis-CI's VMs, it raises SSHException straight up, which fails the test. And on a random Ubuntu 13.10 VM I had lying around, the same test raises AuthenticationException! This sort of system-sensitive bullshit is probably one reason why we get so many related looking but different-traceback reports :(

bitprophet commented 10 years ago

I started prep work for this while dealing with #243, insofar as I added a bunch more log output (DEBUG) to _auth so at least it's easier to trace how a given auth process worked, what errors got raised where, etc.

I also started factoring out some common patterns (nearly every step has the same try/except, auth, check for two-factor, return-or-keep-going, process) though sadly the heavy reliance on variables "falling through" requires some real dumb argument passing & return value stuff. Should hopefully still be worthwhile when done.

However this is turning into a morass so I am punting for now. Left the work in a branch named for #243.

Russell-Jones commented 9 years ago

One approach to starting to debug it:

1) change saved_exceptions to an array, i.e.

saved_exceptions = []
...
saved_exceptions.append(e)

2) Do something like this so you get tracebacks for all exceptions.

3) replace "except Exception" with lists of explicit exception types (maybe by concatenating lists of common ones)

Any thoughts before I give this a go?

Russell-Jones commented 9 years ago

So something like this

import traceback
import sys

saved_exceptions = []

for c in "hello":
    try:
        int(c) # obviously fails
    except ValueError: 
        saved_exceptions.append(sys.exc_info())

if saved_exceptions is not None:
   raise Exception("\n\n".join(
          ''.join(traceback.format_exception(t, e, tb))
          for t, e, tb in saved_exceptions) )

I'd use SSHException instead of Exception.

redbearder commented 8 years ago

I face this issue yesterday and find a workaround solution with another script expect -c "set timeout -1; spawn -noecho ssh -o StrictHostKeyChecking=no $2 ${@:3} exit; expect assword:; send -- $1\r; interact;"; the parameter StrictHostKeyChecking=no is the key arg

hope author can combine this solution to paramiko new release

thx

alexforster commented 7 years ago

This is causing a multitude of questions about "Protocol error: expected packet type 50, got 5" when trying to connect to many IOS-based Cisco devices. The "service-request" dance gets restarted for a second try and the remote end doesn't expect this. People suggest fixing this by setting "allow_agent=False, look_for_keys=False" but that's only a workaround for people who don't use ProxyCommand and/or intend to try authenticating with both key and password auth.

Seealso #519

Got payload (44 bytes, 8 padding) Read packet , length 35 Authentication type (publickey) not permitted. Allowed methods: [u'keyboard-interactive', u'password'] Write packet , length 17 OUT: 00 00 00 1C 0A 05 00 00 00 0C 73 73 68 2D 75 73 ..........ssh-us OUT: 65 72 61 75 74 68 40 D1 E5 F9 35 E4 9E 2B E2 0E erauth@...5..+.. IN: 00 00 00 4C 10 01 00 00 00 02 00 00 00 2E 50 72 ...L..........Pr IN: 6F 74 6F 63 6F 6C 20 65 72 72 6F 72 3A 20 65 78 otocol error: ex IN: 70 65 63 74 65 64 20 70 61 63 6B 65 74 20 74 79 pected packet ty IN: 70 65 20 35 30 2C 20 67 6F 74 20 35 00 00 00 00 pe 50, got 5.... IN: 97 D3 87 1C 13 02 C6 E2 4D E1 4F 5F 67 B0 12 3C ........M.O_g..< Got payload (76 bytes, 16 padding) Read packet , length 59

bitprophet commented 7 years ago

Briefly poking this now since it's getting in the way of debugging other (new feature related, #827) failures. Not going to commit anything to master but just feeling out the space as I slap something together to help me troubleshoot.

Overall exploration:

the serial nature of attempted auths is one major issue (or actions which might lead to auths but may often lead to earlier exceptions like "what the heck is this? this is not an RSA key!")
- there's no perfect way around this, but we can and probably should at least separate discovery of key/auth material from the attempts, to better enumerate what really failed & why
we try something like 7-8 auth types (and grow more periodically) each of which often yields N possible sub-attempts:
- pkcs11 (N attempts depending on number of device slots, e.g. a full YubiKey could try up to 4x)
- gssapi (mic and keyex...don't know how this actually works but let's assume it's 1-2 attempts)
- explicit key obj (x all key types, so 0..4)
- explicit key filenames (x all key types, so N*4)
- implicit default key files (1 per key type, so, up to 4)
- agent (0..N possible keys)
- password (1)
- 2nd factor of 2FA (1)
so in nontrivial situations we could be squashing/ignoring (or if fixed, aggregating) a pretty large number of keys
- though most users will probably be in my kind of situation and only be talking, say, 2-5 agent keys + maybe 1-2 default file keys + perhaps an explicit key filename or pkey obj.
Most of the time, the user is only intending one of these sources to be the "real" one, but it's impossible to really know what that is most of the time; so we will have to guess (and not in the way that it currently works, which is, "welp the last one must've been it!!")
Technically speaking, any sort of exception can pop up when attempting each of these sources, but the most common ones are (and right now these are all SSHException which is complete horseshit and something that will be fixed at the same time, via #258):
- invalid keys (almost always spurious because of the gotta-try-all-key-types behavior above...except when it's that one key you intended to work, which is encrypted or otherwise broken or maybe you gave the wrong filename or etc)
- encrypted keys w/ wrong or no password (sometimes considered a subtype of above)
- actual authentication error (AuthenticationException) meaning a key was successfully instantiated, but was rejected by the server. This is much higher-signal than the others but is still not always necessarily the "right" error to surface.
- various network problems such as timeouts, connection refused, etc (though at least some of these will surface before we get to the auth handler, I think, but at least one will pop up as a timeout-related AuthenticationException...)
- "No existing session" errors (sometimes visible as un-caught EOFError exceptions) which occur when the channel gets unexpectedly closed before all sources have been tried.
- In my experience this happens more frequently under test than in the real world, i.e., because the test server is effectively a single-shot sshd subprocess, so any actual auth attempt causes a transport shutdown, causing this error on any subsequent source attempts.
So what we need is:
- a way of aggregating all of these, generally; we can almost never be 100% sure that we'll guess the right root error (as below) so users must become equipped to do more digging on their end, meaning we must give them all errors we encountered.
- Ideally this'll be another SSHException subclass and then we can opt to release it in a 2.x backwards compat fashion after all, if the new behavior seems stable enough.
- It must be possible to figure out which key source had which result; I see two possible ways to do this offhand:
  - hierarchical internal structure reflecting the above sources list, where the AuthHandler is itself assigning 'slots' (e.g. aggregate_exception['agent'][fingerprint] = <exception> or a_e['key_filename'][filename] = <exception>)
  - Enhance #258 such that key/auth related exceptions store their source (not just a string message), e.g. BadKeyMaterial(klass=RSAKey, filename='/path/to/rsa.key'), then store all such exceptions in a flat list (ordered by when they were tried, of course).
  - Whether these use params or additional class hierarchy to represent the sources (think BadKeyMaterial(type=RSAKey) vs BadRSAKey) is probably not super important?
- Ideally (not really required?) a method of "guessing" which one is the "real" error deserving to be highlighted to the user, e.g.:
- If there's only one AuthenticationException, that's often a good signal
- Conversely, once we raise better exceptions for things like the "No existing session" problem, those can be considered less likely to be the root cause
- And so forth; this is an exercise in heuristics identification. And again, feels like a 2nd-tier feature, we want to supply the user with the entire aggregation either way.
- as noted in an earlier bullet, separating the auth process into "identify which sources actually have workable material" and then "try really authing" would probably also be a great idea as it would simplify troubleshooting cases like "I only have the one key filename and it's a bad key or is encrypted or I have the wrong passphrase".
- Also simplifies testing!

radssh commented 7 years ago

I put together some work leading toward making a cleaner pass through the authentication morass, a bit more in line with the intent of the spec. Initially was going to put in a bunch of extra effort to preserve backward compatibility, but if this can be a 3.0 thing, it may help a ton to scrap a lot of the bizarre-O logic that is in place to make things work currently.

The primary hurdle that seems to be in place, is that the (client) authentication is driven by a user thread sending messages, but has to coordinate with the transport thread fielding responses and indirectly processing work via the AuthHandler callbacks. I patched a bunch of these callbacks to replace the inline work with a Queue.put() of the message and type, making the authenticate thread solely responsible for dealing with the authentication flow. If you can drop backward compatibility, then all message types in [50..69] from the transport thread should probably be just put on the AuthHandler queue, and drop the callback dict altogether.

See: https://github.com/radssh/paramiko/commit/3757bac2ee368d91e559787dd9b49701c8be717f I did not add too much polish, but should serve as an example, without a full retrofit that may not be needed (or desired).

The secondary redesign is to make a single authenticate_client() call that is fed various iterators that supply the usable passwords/keys, and preferred authentication methods. gssapi_keyex and gssapi-with-mic methods seem to not have any user-supplied parameters (nor would host-based, if that's on a future enhancement list), so it would seem that these would not need iterators, just inclusion in the preferred methods list/tuple. Password(s) could be included and attempted as passwords or keyboard-interactive responses, and PKey(s) would seem to be able to be offered, whether originating from local files, ssh-agent, or PKCS11 sources, or a combination .

The implementation also retains lists of the attempted authentications to provide tracking of what steps were attempted, and which failed and which failed with partial-success (or the final element in the list completed the authentication successfully, in the case of multi-factor). This approach seems to eliminate the awkwardness of the saved_exception construct from the old authentication, allowing an arbitrary length trail of the events that were triggered along the way.

It would seem that there would not likely be a need to deal with various AuthenticationException occurrences, other than server response Timeout - the transport either reaches a successful authenticated state, or failure will allow the list(s) to be used as post-mortem. It would also be helpful to permit followup calls to authenticate_client() and avoid re-send of the request_auth message, since some servers (Cisco) barf and drop connection when the client attempts to start over from the very top. Ideally the authentication dialog can keep going until the server gives up (too many auth attempts, or empty supported methods list), or the client runs out of things to try.

ploxiln commented 7 years ago

I have some thoughts here, but I don't use any of the more complex methods.

I think inside the "aggregate auth exception" should be a list of exceptions, not a hierarchy. I'd guess that typically most of these sub-exceptions will be "could not decrypt keyfile" or "auth key was rejected". I agree the exception should have a property of which file path it is related to, or possibly type and fingerprint if it is from the agent.

It might be useful to think of the explicit and implicit auth methods separately. There are already options allow_agent and look_for_keys that you can set to False to disable the implicit methods. I think that in many cases where one of the explicit methods has been enabled/passed, the implicit methods were not intended to be used. The excess failures can be confusing, and can also cause ssh servers to close the connection. I guess tools using paramiko could already make it easier/default to disable these though.

The password is the most tricky case. If password is specified, it's used to decrypt specified keys, decrypt discovered keys, for password auth, for second factor I think ... and then, if everything fails, interactive auth asks for another password. Maybe auth_key_password, or interactive, should be separate parameters. I really don't have any good/strong ideas though.

bitprophet commented 7 years ago

@radssh Nice, thanks! (I notice that's a month or so old, was that posted for some other ticket?) Good possible jumping off spot; I specifically like the use of a queue & the explicit authentication-types ordering (& then relying on that for 'continue-with' second factors).

Re: whether this level of rework (or less, or more) is appropriate - depends a lot on whether I want to try getting some variant out as e.g. 2.4. I think I can get out something backwards compatible, in which case we'd be open to do a larger architectural rework in 3.0...we'll see.

Re: @ploxiln:

should be a list of exceptions, not a hierarchy

Yea, afterwards I came to the same conclusion, the order-of-attempt needs to be preserved regardless. I might still like a "both" option where one can very quickly see in a repr() that e.g. 4 agent keys were tried, or that 1 key_filenames key was tried and it seemed to load OK as an RSA key; but where the actual order is still available somewhere.

The 'rich exceptions' approach still seems best, esp since you can get the above value by viewing the whole list and/or doing things like [x for x in exceptions if isinstance(x, BadKeyMaterial)].

I think that in many cases where one of the explicit methods has been enabled/passed, the implicit methods were not intended to be used.

Absolutely true. I'm guessing one reason for the current hodgepodge is that it's, sadly, how OpenSSH behaves by default - you have to manually set ssh_config values somehow, or use other tricks like temporarily unsetting SSH_AUTH_SOCK, if you want them disabled - even if you're e.g. doing ssh -i <explicit key>.

So going by our normal "OpenSSH is king" approach we'd perhaps never want to fully disable the implicitness; but we should do everything we can do make it easy to "do the right thing", which offhand does mean an easy method of turning off all implicit stuff, a way of configuring that permanently, and clear/visible documentation that it's even a thing to be aware of.

(The configuration angle is harder for pure Paramiko since it only wants ssh_config, but as you noted, libraries one layer up frequently have their own config options.)

Maybe auth_key_password, or interactive, should be separate parameters.

This has felt like a nasty kludge forever so yes, I'd love to explicitly separate them. This is one spot where a true change requires a 3.0 but we can at least add a 2nd, new kwarg in 2.x - I did the same thing for Fabric recently too (Fabric 1.x gained a sudo_password option so one can disambiguate login from sudo password values.)

radssh commented 7 years ago

@bitprophet - This was not linked to any ticket/PR, I had a period of ambitiousness in trying to address the general flow of authentication. It stalled out prior to making the backward compatibility patchwork (and the GSSAPI authentication that I don't have a core understanding of how it is supposed to function, and no environment to test that aspect, making me really hesitant). Looking at the extra baggage to maintain the backward compatibility, I just shelved the work as-is. With the opportunity (and desire) to overhaul the API for a possible 3.0, that re-sparked the interest in at least sharing what I had as a possible path to follow. Retrofitting the backward compatible pieces would not be a major addition, except for possibly the 2FA shenanigans - it also would defer the jettisoning of lots of old code that is just yearning for the scrap heap.

If you want to wrench this into a 2.4 release, I will see if I can fold the GSSAPI steps in, and patch the backward compatibility loose ends, and put a bow on a PR, then sharpen the machete to chop away a bunch of stuff for 3.0.

bitprophet commented 7 years ago

In my head the 2.4 stuff is actually even simpler than what you've got there, it's really just replacing all the saved_exception = e with an aggregate exception object (new SSHException subclass), then raising that at the end. This would allow users to gain a lot more insight about failure modes, with a minimum of instability.

(There are a couple minor wrinkles as outlined above, re: actual format of that object; and I also think it may need to come in two flavors so any AuthenticationException members cause the aggregate object to also be an AuthenticationException in case someone is explicitly checking for that subclass; but that all still feels pretty tightly knit to me.)

I'll probably execute on this soon since I need it for Fabric 2 (lest Fabric 2 repeat too many of Fabric 1's sins regarding how it handles passwords & key decryption.)

When (paramiko) 3.0 rolls around we'd look at your patch & figure out whether that's the right approach or if we want something even bigger.

bitprophet commented 6 years ago

Backwards-compatibly fixing one aspect of this right now: adding a passphrase= kwarg to SSHClient.connect so that Fabric 2 can safely offer correctly split-up CLI options, config params, etc, for password vs passphrase stuff.

Users will still be able to cram either/or into just the password field, but at least now the option's there to give both (and then the auth fixes I'm about to execute on, will be able to provide that much clearer a signal if the user provided passphrase explicitly.)

Then in Paramiko 3.0 we'll be able to stop reusing password altogether. (And that's if we don't come up with a somewhat better interface for both types of values...)

bitprophet commented 6 years ago

High level code-org thoughts as I look at adding stub tests re: new changes:

Feels like SSHClient wants to eventually be pulled 'up' into Fabric 2...it's largely high level instantiation/configuration logic plus "figure out which Transport auth method(s) to call".
- I'd like to avoid that for now; but I may pave the way by splitting out as much logic as possible into its own module. We'll see. That's something I'd probably do either way.
- If this does happen, many of the tests I'm about to write would necessarily also move into Fabric's suite, though at least a handful would regrettably need to live in both places (largely the single-auth-source tests that prove base cases function OK.)
Tests about authentication are currently split across test_auth.py and test_client.py, with the former tickling some (but not all...) Transport.auth_* methods and the latter is of course only doing things via SSHClient.connect.
- test_auth.py:
- checks password auth reasonably well
- does interactive auth a little bit
- only tests public keys as part of a multipart auth test (and there's only one of those - password+key).
- it also goes over a few edge cases (bad auth type requested, disconnection during auth, and timeout during auth.)
- test_client.py:
- checks various public key types, including some recently added tests that test some of what happens when the user feeds in multiple keys of different types at once
- checks publickey certs (added recently)
- tests auth 'trickledown' (i.e. that a bad key won't prevent a correct password from succeeding)
- tests that auth itself can have a timeout set (this seems distinct from the "what happens to Transport if the server goes away during auth" test in the other module)
- has some very new tests I wrote re: the new passphrase kwarg & how it plays with the older password arg
- I think between them we cover most things relatively well, but I'm sure there are at least a few cases left untested or not tested enough.
I may continue using the two files, since the 'auth' tests are more "unit-y" and the client ones naturally more "integration-y"...still figuring that out.
- Might move test_auth.py into test_transport.py tho?
- And may still just start over with a new test module and migrate other tests into it as I go. I'd really like an easy to read/append file with multiple subsuites that clearly outlines what's under test.

bitprophet commented 6 years ago

Have some DDD-level thoughts about this in my local repo and about to take a stab at implementing it:

Going for a high level, replacing SSHClient._auth approach for now, with the assumption that the low level guts in Transport and AuthHandler don't themselves require fixing in order to arrive at a cleaner user-facing flow.
Idea is to have an easier-to-use-and-test new class that will be called by SSHClient (and which could be used standalone if desired) for its auth step.
This class, Authenticator, would have two APIs:
- a "clean" one accepting an iterable (or posargs) of AuthSource (or w/e) subclass instances, explicitly to be tried in that order. E.g. Authenticator(transport).authenticate(PrivateKey('foo.key'), AuthAgent(...), PasswordAuth('secret')) to try a private key first, then to try keys from an agent, then finally password auth
- a "legacy" one accepting similar kwargs to SSHClient.connect, which is actually just a bridge to the other API, implemented via a mostly-internal method - which gets its own unit tests proving that e.g. Authenticator(transport).auth_with_kwargs(password='secret', key_filename=['foo.key'], allow_agent=True) just results in the posargs shown in the previous snippet.
The first API would be used in the new reorganized auth test suite, and would be eventually used directly by Fabric 2 or similar. New, clean, shiny etc.
The second API would be used as a drop-in replacement for SSHClient._auth, allowing the newer logic (& semi-new exception behavior) to be invoked via a familiar interface.

I finally looked over @radssh's branch about this in more detail (and to see how much of it I was possibly reinventing).

Hadn't remembered that it's implemented within AuthHandler itself, which is an interesting choice. I assume in part for backwards compatibility?
I'm not 100% grokking whether the modifications to threading/execution flow are actually necessary / what they're fixing.
- In other words, if I continue with my own approach to this -- which tackles only high level flow and assumes that Transport's individual auth methods are more or less correct as-is -- am I overlooking something important?
Similarly, I'm not sure if the extra bookkeeping & centralization of auth subroutines within authenticate_client is something I'll end up wishing I had access to at the high level.

The good thing is, I think the two approaches aren't mutually exclusive, and so I'm going to take a stab at the high level first, with option to modify AuthHandler later with some of these other ideas if they seem necessary.

radssh commented 6 years ago

Started with the code in AuthHandler, primarily because I don't make use of SSHClient, which apparently puts me in the minority. Whatever the long term resolution winds up being, it seems natural to have it reside in AuthHandler proper, and SSHClient (or Transport) adapt accordingly. It was interesting to find in the code history, the original evolution of making AuthenticatedTransport a subclass of Transport, before switching over to the dedicated AuthHandler class.

Haven't had the opportunity to code more down that path, even with the liberty to scrap the backward compatibiilty, but I have been solidifying some more ideas that might be worthwhile...

Make username a parameter to the AuthHandler constructor. Server (OpenSSH) implementation does not let you change username during the course of authentication, so it doesn't seem to make much sense allowing a change, or for having to repeat it for each authentication flavor submethod.
AuthHandler keeps internal lists (initally empty) of password(s) and key(s) for authentication attempts. Possibly a (empty) dict for keyboard-interactive {prompt: reply, ...}, and some flag/settings for GSS and host-based authentication. An initial sequence of (auth-none) of preferred authentication types.
Implement AuthHandler.add_password() method to supply a password, that appends to the internal password list, and if absent, append 'password' to the preferred authentication.
Implement AuthHandler.add_PKey() to supply a PKey to try, with similar bookkeeping as add_password(). Require PKey parameter, and not filename to avoid adding complexity.
Implement similar methods for auth_gssapi_with_mic with its option set, auth_gssapi_keyex (no options?) and possibly hostbased (currently not available in Paramiko), that capture the specifics of what parameters may be needed at the time the authentication is offered. Call to these methods would include the side effect of appending to the preferred (available) authentication types.
Completely gut the parameter list of authenticate_client(), making it a pull-the-trigger option with any/all of the bullets loaded from the AuthHandler.add_* calls made, since all of the parameters currently listed would be preloaded into the AuthHandler object. The internal flow should be similar, and properly support the matching of the server side "authentications that can continue" versus the client "preferred/available authentication methods", and correctly handle multifactor authentication, as long as the set of factors were loaded from previous add_* calls, and not dependent on ordering.
Either replicate the bulk of the add_* calls into the SSHClient class, or change the SSHClient.connect() call to take a single AuthHandler object instead of the dozen or so parameters that drive the authentication choices today.

Wish I had the opportunity to code more of this to completion, as I suspect this description might not be grokked as well as tangible code examples. Trying to find that sweet spot in between OK and TL;DR :)

sirkubax commented 6 years ago

https://stackoverflow.com/questions/47636165/az-acs-kubernetes-get-credentials-invalid-ec-key-ssh-known-hosts-corrupted

bitprophet commented 6 years ago

@radssh Thanks for the additional thoughts!

Being hip deep in the tests (& related support tasks like code cleanup) for all this is making it even more clear there's a lot of responsibility bleed between SSHClient, Transport and AuthHandler. Including but not limited to:

Transport is not terrifically useful without AuthHandler, and vice versa, which is always a bad sign...
Most of what the average user (including Fabric) thinks of as "Paramiko auth behavior" is implemented in SSHClient.auth...except for a bunch of random bits & bobs in the other two classes, like how Transport.auth_password quietly falls-back to keyboard-interactive when password-type auth isn't supported by the server.
There's a connect method on both SSHClient and Transport, the latter of which I wasn't even aware of until recently, and which only implements about 20% of the stuff covered by the former.
And so forth.

I think there's a reckoning coming where I upend a lot of these guts and try to figure out if there's a cleaner way to arrange the lines of responsibility. Not happening today (talk about scope creep) but...

radssh commented 6 years ago

@bitprophet - I am revisiting this (now that I reacquired both some spare time AND motivation), and have some progress made on a significantly cleaner fresh pass. Still seems like keeping backward compatibility is way more trouble than its worth. Saw on IRC that you were starting the recoding in earnest, and wanted to again offer cooperative time & effort.

radssh commented 6 years ago

Dropping a note here, as I accidentally discovered what looks like an OpenSSH server bug doing some test case scenarios. Burned a ton of time debugging my own code, then found I could reproduce using OpenSSH command line client...

Following a failed password attempt (with a non-blank incorrect password) with keyboard-interactive (with correct password). On Ubuntu 16.04 (OpenSSH 7.2), the authentication succeeds, then the server immediately drops connection; On OSX 10.13.4 (OpenSSH 7.6), the keyboard-interactive request is sent, but the server never responds with a challenge, and then connection drops after socket timeout.

ssh -v 192.168.1.191 -o NumberOfPasswordPrompts=1 -o PubkeyAuthentication=no -o PreferredAuthentications=password,keyboard-interactive

OSX:

...
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,password,keyboard-interactive
debug1: Next authentication method: password
user@192.168.1.191's password: 
debug1: Authentications that can continue: publickey,password,keyboard-interactive
debug1: Next authentication method: keyboard-interactive
<<< 60 second delay >>>
Connection closed by 192.168.1.191 port 22

Ubuntu 16.04:

...
debug1: SSH2_MSG_SERVICE_ACCEPT received
Ubuntu 16.04.4 LTS
debug1: Authentications that can continue: publickey,password,keyboard-interactive,hostbased
debug1: Next authentication method: password
user@ubuntu's password: 
debug1: Authentications that can continue: publickey,password,keyboard-interactive,hostbased
debug1: Next authentication method: keyboard-interactive
Password: 
debug1: Authentication succeeded (keyboard-interactive).
Authenticated to ubuntu ([2600:1700:f230:3f60:d5ef:6fa0:5470:456f]:22).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: pledge: network
debug1: channel 0: free: client-session, nchannels 1
Connection to ubuntu closed by remote host.
Connection to ubuntu closed.

In both cases, entering an empty password on the first (password) attempt allows the subsequent keyboard-interactive attempt to succeed. Something apparently bleeds over and taints the keyboard-interactive handling if the initial password attempt has failed content...

Will be having a look at the OpenSSH code and bug reports.

bitprophet commented 6 years ago

Haven't time to glance at the above just yet (busy releasing a 7 year overdue rewrite of a related lib...!) but want to make it real clear that I do have a partly done branch for this. Let me know if you can't find it and I'll make sure it gets pushed soon. Had a bunch of nice tests and DDD written or stubbed out and "just" needed them fleshed out and implemented.

The core setup I have envisioned feels like the Right Way regardless of whether we can shim it into the current high level interface in a backwards compatible fashion, or not.

bitprophet commented 5 years ago

Note to self in case I don't get back to this for reals in the next few days (need to resist urge, but also, god it's so bad): one aspect of this change, if possible, should be that when all auth fails and we bubble up a newer hybrid/aggregate exception - if it contains a PasswordRequiredException, that one should be what the aggregate exception "acts like" or otherwise exposes "first".

For example, the scenario I'm running into personally with Fabric 2 as as follows:

Connecting to a specific host, for which only one key on my system is valid
- Thus, I'm specifying ~/.ssh/id_rsa.old in connect(key_filenames=[...]).
- This key is RSA, and encrypted with a passphrase.
I am not specifying a passphrase or password value (in part because my mental/default workflow for this assumes SSH Agent support)
No default-named keys exist in ~/.ssh (e.g. there is no ~/.ssh/id_rsa etc) - otherwise those might get tried too
SSH agent support is enabled (because of the above), and the agent holds two other keys besides the one specified above.

In this scenario, Paramiko's current Transport._auth ends up doing the following:

Tries RSAKey, DSSKey, ECDSAKey etc, in turn, on id_rsa.old.
RSAKey yields PasswordRequiredException (correctly)
The others yield SSHException('Not a valid XXX key file')
Continues down _auth, eventually trying the two SSH agent keys, each of which yields AuthenticationException('Authentication failed') (because they're not valid for the target host)
That's all Paramiko is able to try, so it uses the last exception (AuthenticationException) to raise to the calling method.

What should ideally happen in this scenario is that the logic identifies PasswordRequiredException as the only actionable exception:

Raw SSHException usually means something lower level (or rather, we've slowly added more specific subclasses like PasswordRequiredException and AuthenticationException...) so not a good candidate
AuthenticationException is a stronger signal (it means we presumably got useful [encrypted-but-unlocked, or unencrypted] key data, which was then rejected by the server), but it's still not as good as PasswordRequiredException, because we can't know for sure that the entire process would fail if that key had been unlocked.

Thus we should raise some form of the password-specific exception to the user; then higher level client code, such as Fabric 2, would have the option of prompting the user (and, further, adding to the agent, if desired) and resubmitting connect with a passphrase option.

(In Fabric 1, this "works" because the code assumes any SSHException when using an empty password must be due to a missing passphrase, and prompts the user. This of course has its own pile of false positives in other, equally-likely scenarios.)

It's tempting to slap a band-aid on this and have the various spots that handle SSHException in _auth, immediately raise this exception type, however that would be incorrect for a few reasons:

It's possible to encounter PasswordRequiredException early in the process (as above) but then to have, say, an SSH agent borne copy of it succeed. In other words, we can't actually know early on if even this strong a signal is truly fatal to the auth process.
- This is in fact what happens if I run OpenSSH (or just manually add the key to the agent myself); the above flow changes slightly, adding a third agent-borne key attempt, which will succeed.
- It's a bit silly, right? I'm specifying the key path, but it's actually not usable unless I pair it with the passphrase. However, this is still not that misleading, because having it explicitly specified should still trigger a passphrase prompt at the worst (which would occur every time) or, more realistically, triggers addition to the ssh-agent (as happens with OpenSSH's client, at least as mine is configured).
Accounting for the various exception subclasses is redundant work, because of the many phases of _auth's current implementation. In other words, it screams out for this overall ticket's work.

bitprophet commented 5 years ago

Interestingly, I just ran across #1317 posted by @twirrim, where he seems to have a similar set of exceptions but implicitly asserts my above assumption is incorrect.

Specifically, he would have encountered some middle-of-flow AuthenticationExceptions (presumably, since he said he had a key that was rejected by server auth) but then the last exception, and thus the one raised to him, was for an encrypted (and not unlocked) key (thus PasswordRequiredException).

In that scenario, assuming PasswordRequiredException is the "highest signal" exception would have been wrong.

Really, this just proves that the higher level parts of this ticket are more important in all cases: make it much easier to specify exactly what auth methods to try or not try; and make it much clearer in failures what was tried and what the result of each attempt was.

FWIW, the emphasis on "how can we boil everything down to a single exception?" is primarily for backwards compatibility - though even if we went a feature flag or 3.0 route, there's simply no awesome way to square the problem of "tried more than one thing before returning control to the user". We'll always be guessing about which auth source the user really wanted.

Which brings us back to: make it easier for the user to say "I want auth source X and only X for this particular connection". We actually don't truly want the convenience of trying default keys, explicitly given keys and agent keys all the time every time.

jsardev commented 5 years ago

Any news on this? Just wanted to try out fabric and wasted a lot of time figuring out how to use a private key for authentication. It's really hard to grasp it right now :cry:

eepstein commented 5 years ago

Any news on this? The most recent release of Paramilko (2.4.2) fails with SSHException's raised with an "invalid key" - boy oh boy would it help to have that message changed to WHERE and which Key impl was throwing it, giving some more specific clues. After a debug session, I've tracked our issue to a too-stringent check in Ed25519Key that seems to fail on older openssh keys after the initial parsing has succeeded. Actual os ssh clients do not fail and the keys work fine with ssh -i but paramiko barfs and it's been a challenge figuring out a work-around.

bitprophet commented 5 years ago

This remains one of my highest-priority "really want/need to do it myself, can't just merge a patch" items. I've been quiet because of Life™ but should be back doing OSS things again ~1day/wk from here on. So hopefully I'll get to it once I clear out a bunch of waiting PRs.

twirrim commented 5 years ago

I can maybe put some time towards this, if you need extra help. I haven't done a deep dive in to the code any time recently to have a specific idea on a route forwards here, but may be able to investigate.

bitprophet commented 5 years ago

The best thing I can use is just a sanity check of the API I end up constructing.

I need to reread all this & I have a branch open somewhere (doubtless ancient now), but I had a pretty firm API in mind a few years back that largely entailed an iterable "auth mechanisms to try in this order please" on the input side, and a matching aggregate exception that's raised if auth never succeeds, letting you see the individual results for the parts of that iterable.

So you might say connect(authentications=[PrivateKey(some_rsa, passphrase="butts"), Password("secret"), PrivateKey(some_new_thing), PKCS11(...)]) and then it either succeeds or raises eg AuthenticationException(results=[PrivateKeyFailure("required a passphrase"), PasswordFailure("rejected"), PrivateKeyFailure("no idea what this key type is sorry"), PKCS11Failure(...)]). (Pseudocode, obvs.)

Question is how far did I get there and what does it look like when I get it "working" again in modern times (and do other invested users see obvious holes in it).

radssh commented 5 years ago

I got a good amount of work in progress on this topic, left off in the middle of a massive overhaul of the unit test cases. Existing auth methods are all implemented, including GSSAPI (finally got a test VM with FreeIPA to kick the tires on that mechanism), and was going to leverage an existing host-based auth PR to round out the suite of mechanisms.

I'll try to put a bit more polish on the current state of things to allow a cursory taste-test on this.

CharString commented 4 years ago

The best thing I can use is just a sanity check of the API I end up constructing.

Can't really say I had the scrutiny of a sanity check, but I was looking for the reason why, when connecting, OpenSSH client would ask me permission to use the private key material of only 1 key (the key that was actually accepted by the server) whilst Fabric would try all keys in the agent.

I ended up wanting to write a patch for whatever comes out of this: https://github.com/paramiko/paramiko/blob/auth-shakeup/tests/authentication.py#L166

OpenSSH first only offers public keys to the server, and only when it's accepted, tries to get the private key information. This means, in your config you can set IdentityFile to a public key, to hint it will should offer that one, and then get the corresponding private part from the agent.

I found the stubs readable and can follow idea of the lifecycle. And I can image where I would need to code the behaviour I miss. So for what it's worth. Good job on the API. I hope Life™ ends some time soon so this can get nudged forward. ;-)

anarcat commented 4 years ago

i made #1647 to at least have some diagnostics when paramiko falls over on a key. before, we wouldn't even know which key was a problem:

paramiko.ssh_exception.PasswordRequiredException: Private key file is encrypted

now at least we'd see this:

paramiko.ssh_exception.PasswordRequiredException: Private key file '/home/anarcat/.ssh/id_rsa' is encrypted

fuzzy76 commented 2 years ago

Is there some kind of troubleshooting page for these kind of issues?

I have an SSH agent running, SSH_AUTH_SOCK points to it, my key is added to it (and unlocked), and paramiko still throws paramiko.ssh_exception.PasswordRequiredException: Private key file is encrypted. I have no idea what more to try.

DarkArc commented 1 year ago

Is there some kind of troubleshooting page for these kind of issues?

I have an SSH agent running, SSH_AUTH_SOCK points to it, my key is added to it (and unlocked), and paramiko still throws paramiko.ssh_exception.PasswordRequiredException: Private key file is encrypted. I have no idea what more to try.

I ended up here and realized it was because paramiko isn't using the SSH config's specified username so while ssh some-host-name works, paramiko isn't using the username from the SSH config, and instead is trying ssh defaultuser@some-host-name instead of the (implicit) ssh correctuser@some-host-name.

bitprophet commented 1 year ago

Digging into this at very long last. After reviewing the related code in both Paramiko and in Fabric (where it obviously comes up a whole lot), I'm left with the following:

It's best to implement this as a new, opt-in code path, so it can get out into the wild w/o more backwards incompatible releases & so we don't make everyone deal with any bugs straight off, if they weren't already affected by these auth problems.
Architecturally, this shouldn't need major changes to Transport/AuthHandler/Packetizer, only to SSHClient or above.
I suspect the most sensible place to do it is actually as updates to Fabric (its Connection class) as Fabric is essentially the porcelain to Paramiko's plumbing. We'll see; may still make sense to implement the new classes in Paramiko & reference them from Fabric.
- This would likely mean SSHClient eventually gets deprecated and Paramiko becomes much more explicitly the low-level SSH toolkit (people are continually wasting time & reinventing bugs by writing the same code Fabric has handled for ages).
- It also makes sense from the perspective that Fabric uses Paramiko's ssh_config module & this frequently informs the overall auth flow; Paramiko itself has never automatically done any config stuff. Again, high level vs toolkit.

I'll be spiking this out over the coming weeks with the intent of:

Skewing closer to the default behavior of OpenSSH for things like the order of auth sources tried (this is actually the cause of most high visibility auth issues - "well, the exact same config/keys/agent setup works under OpenSSH..."
- For example: Fabric+Paramiko currently try your ssh_config IdentityFile directives before your ssh-agent keys, while OpenSSH is the opposite!
Improving the API used to disable/enable/entirely replace certain auth sources (vs SSHClient's big organically grown pile of kwargs).
Improving the exceptions raised when all auth sources failed; or for that matter, the return value of the connection call even when it succeeded (which right now is a not very helpful None).
- This is almost certainly mentioned up above, but we largely "just" need a wrapper class, broadly similar to NoValidConnectionsError - something preserving the order of what was tried and what their results were.
- This will likely closely mirror whatever is done for the input side of the equation.

anarcat commented 1 year ago

Hi Jeff! Really nice to see you working on this!

For me, #1647 was pretty darn useful to diagnose problem, and I still miss it in my day-to-day use of Fabric. I've gotten used to Fabric throwing me a "key is encrypted error" and interpreting this as "oh, I need to plug in my Yubikey" but that's kind of a terrible UX for newcomers...

But yeah, there's a whole other stack of things to improve on top of that... I figured you might like to look at this MR as a good, somewhat small change...

bitprophet commented 1 year ago

@anarcat Yea, that sort of added visibility is absolutely going to be part of the new exception type/s I'm envisioning.

Still figuring out the details, but at the absolute worst, a caller would be trivially able to get the list of key paths attempted (and how each failed), from introspecting the top level exception in a try/except block.

And offhand I assume the cleanest way to do it is to implement a variant on your PR, that raises more-detailed subclass exceptions which the top-level/aggregate one literally just sticks in a list. This way, lower-level users of Paramiko still benefit, and the higher-level code there or in Fabric doesn't have to do as much work either. 🤔

anarcat commented 1 year ago

That does sound much better than the hack I did then. :) Feel free to close the PR...

radssh commented 1 year ago

@bitprophet -

I have been tinkering with the concept of wrapping the entire authentication process within a separate class, primarily driven by a SSHConfig object, which should a) avoid a bunch of the **kwargs and b) keep things friendly for embedding configs in ~/.ssh/config-like style. Keeping a default IdentityFile value like OpenSSH and support for IdentitiesOnly gives the level of control that OpenSSH provides, with least surprise in behavior.

There are also some conditional handling things, like shifting to rsa-sha2-256/512 instead of ssh-rsa when client and server support/prefer/require. Again, the leveraging of SSHConfig helps a great deal with the client-side configuration for this. I did have to retain some session info in the Transport to pull this off, specifically capturing the server-sig-algs message that may be passed during the connection setup.

That being said, I've only focused on the client side of things for authentication (and key handling), the server side I have had the luxury of ignoring for my needs.

bitprophet commented 1 year ago

There are also some conditional handling things, like shifting to rsa-sha2-256/512 instead of ssh-rsa when client and server support/prefer/require

Weird, how recent is the branch you're working off for this? As of a few revisions ago we should be dealing with server-sig-algs mostly-correctly & in fact most users that run into SHA2 issues are the ones who have to disable it when their targets aren't correctly indicating the lack.

the server side I have had the luxury of ignoring for my needs.

Honestly, same for now. I do want the server to get more compatible sometime too, but it's always a vanishingly smaller userbase vs client so difficult to justify the time until it becomes a self-itch-scratch (eg if/when I need the test harness server to act more OpenSSH-like...).

I'm still kicking this around but I am continuing to think that doing this at the Fabric level makes the most sense. Fabric already autoloads ssh_config, for example, so it just needs additional tweaks to be more OpenSSH-compatible. Doing it there also makes ignoring the server side slightly more understandable as Fabric has no server component 😇