psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.18k stars 9.33k forks source link

Let the user provide an SSLContext object #2118

Closed brandon-rhodes closed 7 years ago

brandon-rhodes commented 10 years ago

The requests library seems to grow more and more keyword arguments to try to provide all of the flexibility that SSL users need. As of Python 3.2, the Standard Library now offers a different approach: an SSLContext that can accept settings for TLS protocol version, CA certificate list, identity certificate, secret key, allowable cipher list, Diffie-Hellman parameters, server-name callback function, whether to verify server hostnames, and so forth. It has a wrap_socket() method that starts up TLS on a socket using precisely the settings it has been configured with.

This lets protocol libraries in the Python 3 Standard Library opt out of needing keyword arguments for any of the above settings. They can simply accept a context= keyword argument, use the context to wrap their encrypted sockets, and stay out of the business of understanding SSL and all of its different settings.

If the requests library under Python 3 started supporting a context= parameter like the Standard Library protocols, then users could fine-tune their encryption settings without requests having to become more complicated.

A use-case: many users today are concerned about Perfect Forward Security (PFS) and want to only make connections with ciphers that at least make it possible to build connections that cannot be decrypted later if a secret key is captured. But the current requests library, so far as I can see, makes no provision for this. Nor do I want it to: adding a new ciphers= keyword would be only the first of a dozen other keywords that SSL users will need added over the coming years. But if requests accepted a context= parameter, then I can create an SSLContext and tell it which protocols I am willing to use and have requests (and urllib3) use that context for building their SSL connections.

sigmavirus24 commented 10 years ago

The requests library seems to grow more and more keyword arguments to try to provide all of the flexibility that SSL users need.

I'm not sure if you mean the fundamental get/post/put/delete/head/options/request methods/functions, but if that is what you're referring to, those haven't changed since before 1.0 was released. If you mean that there's an increasing amount of flexibility within adapters to fine tune this, then yes, you're correct.

If the requests library under Python 3 started supporting a context= parameter like the Standard Library protocols, then users could fine-tune their encryption settings without requests having to become more complicated.

We're very unlikely to start supporting a keyword argument on only one version of Python. urllib3 will discard keyword arguments on specific versions of Python, but we have yet to do that.

In general I like the idea, but so long as we will support Python 2 (unless a backport of the SSL module can be successfully provided), we will have trouble supporting this. Further, we're ignoring whether or not @kennethreitz would even want to add this keyword argument. I for one would be in favor of adding it (when we can support all Python versions equally), but I don't get to make these decisions ;).

That said, I'll have to investigate if urllib3 will support SSLContext objects (because I frankly don't remember). I'll look into that tonight unless @shazow chimes in before then. I'm also willing to do the work to allow urllib3 to accept them.

Lukasa commented 10 years ago

Can we step back for a moment?

@brandon-rhodes I'm really not clear about what you mean by requests sprouting more and more keywords. We have two: verify and cert. Anything else is done at the HTTPAdapter layer on a case-by-case basis, which allows exactly what you've described.

What have I missed? What parameters are you identifying that I don't see?

Lukasa commented 10 years ago

Reading your post again @brandon-rhodes, I realise that your concern is not actually that we currently have too many parameters, but that we might end up having too many parameters. In this I agree with you.

However, I think requests should continue its proud tradition of solving the 80% use-case in the primary API and allowing the Transport Adapter API to help the remaining 20%. For this reason, I'd never accept adding a ciphers kwarg to the main API: it's just too niche, especially if we require it to look anything like OpenSSL's terrible cipher format.

Instead, let's look at providing something like the ssl_version support in urllib3. This would allow us to write something like the SSLAdapter, but for ciphers (or indeed just to extend the SSLAdapter). That allows people who genuinely do care to plug something in, while allowing requests to continue to support the best-possible use-case.

brandon-rhodes commented 10 years ago

Further, we're ignoring whether or not @kennethreitz would even want to add this keyword argument.

By no means did I intend to ignore @kennethreitz’s wishes! Alas. Opening this issue was, I had thought, precisely the means by which Mr. Reitz’s wishes could become known. Would there have been a more appropriate forum for asking my question?

In this I agree with you.

Thank you, @Lukasa! Sorry if I worded the issue awkwardly and it required multiple read-throughs.

I think requests should continue its proud tradition of solving the 80% use-case … I'd never accept adding a ciphers kwarg to the main API … This would allow us to write something like the SSLAdapter

So the keyword arguments to get() and the other functions are not “kitchen sink” collections of everything that could be specified, but a smaller collection of settings, and users are intended to step back and create adapters for more difficult cases? Then you are correct that what I probably need is an adapter that accepts an SSLContext for building connections!

The urllib3 library accepts an ssl_version keyword? That, I fear, is a first step towards insanity, and a course which can be stopped by turning to SSLContext. Because the next logical step after ssl_version (which is really setting what OpenSSL calls the “protocol” version, from what I can see?) is adding a ciphers keyword, and then a dh_params keyword, and then ecdh_curve, and so forth.

In fact, what I really probably want is an adapter that does not even know that SSLContext exists, but simply accepts that I have gotten ready an object with a wrap_socket() method that it can use when it is ready to negotiate an encrypted connection. That way, as long as an SSL library that I want to use in the future also offers a wrap_socket() method (think of PyOpenSSL, or that new cryptography project), then it would automatically work if dropped in to the transport object.

sigmavirus24 commented 10 years ago

Would there have been a more appropriate forum for asking my question?

No. I just like to couch my positive replies in "But Kenneth might not want this, so don't get your hopes up". It might take him a while to get around to reading or replying though.

So the keyword arguments to get() and the other functions are not “kitchen sink” collections of everything that could be specified, but a smaller collection of settings, and users are intended to step back and create adapters for more difficult cases?

That is correct. There are currently a few good examples of this (like the SSLAdapter @Lukasa linked above).

Then you are correct that what I probably need is an adapter that accepts an SSLContext for building connections!

Good news! @Lukasa has a blog post about building adapters for requests and we can both help you with building one of these. Further, I'd be interested in it if only to add it to the toolbelt if only to help it be discovered more easily.

In fact, what I really probably want is an adapter that does not even know that SSLContext exists, but simply accepts that I have gotten ready an object with a wrap_socket() method that it can use when it is ready to negotiate an encrypted connection.

You could do this, but then you would probably doing a lot of urllib3's work since it doesn't (if I remember correctly) provide a way for you to pass in a socket to use (and I don't think it should start accepting sockets to use either).

It might be more helpful to acquaint yourself further with the adapters that currently exist in requests, to understand how they work. They're not terribly difficult to understand either.

brandon-rhodes commented 10 years ago

Okay! I will go take a look at them, and then come back later and comment again on this issue.

shazow commented 10 years ago

Supplying your own socket factory thing is indeed something urllib3 wants to support. We had some discussions about providing a class which "contains" all the parts you need to make an SSL contexted socket. (https://github.com/shazow/urllib3/pull/371#issuecomment-40299324 alludes to this)

For now, you'd need to set the .ConnectionCls property of a *ConnectionPool object to bring your own thing. All it needs to be is something that extends httplib.HTTPConnection (or urllib3.connection.HTTPConnection, I suppose) for the constructor and implements .connect().

Here's the meat for our verified SSL stuff: https://github.com/shazow/urllib3/blob/master/urllib3/connection.py#L191

Lukasa commented 10 years ago

Hooray for three person issue conversations! They're always so easy to follow. =) @brandon-rhodes, you've provided lots of great options for us. I'd like to try to enumerate them as I understand them, and then provide feedback on each of them. Please step in if you think I've left anything out or misunderstood something.

  1. Make it possible to pass urllib3 a socket.

    I don't like this idea much. In principle it's do-able, but it violates the abstraction layer that urllib3 provides. Our favourite feature of urllib3 is that it performs connection pooling, and for it to do that it needs to be able to transparently create new connections (to conserve resources). This means we can't say "here, use this connection object" because urllib3 owns all the connection objects. This idea is sadly unworkable.

  2. Provide urllib3 with an SSLContext.

    This is workable, as urllib3 can use it as a kind of connection factory. However, urllib3 right now does an excellent job of transparently interworking between PyOpenSSL and the standard library's ssl module. This transparent interworking is only going to get stronger as we move toward hyper and HTTP/2. This means being able to provide an SSL context is something of a footgun: if you provide a stdlib SSLContext to hyper, it won't be able to make an HTTP/2 connection through it, it needs PyOpenSSL's.

    Additionally, the PyOpenSSL Context does not have the same API as the ssl module's one. This is incredibly annoying, and @alekstorm has done awesome work by writing a compatibility module (backports.ssl) to paper over the differences. Again, however, I note that working around the myriad user inputs is a logistical nightmare.

  3. More SSL keyword arguments.

    Brandon, you've expressed a discomfort with the many keyword arguments potentially required. I am sympathetic to that argument. I have no good alternative to using either a **ssl_kwargs or to have a ssl_args dictionary argument, neither of which is great. I guess a NamedTuple?

EnTeQuAk commented 10 years ago

I'd vote for passing SSLContext around. I actually have a usecase for that (https://github.com/keybar/keybar/blob/master/src/keybar/utils/security.py#L53) and would love to pass around this config.

Tornado for example implements a fancy ssl_options_to_context (http://tornado.readthedocs.org/en/latest/netutil.html#tornado.netutil.ssl_options_to_context) which checks for a SSLContext object that can be applied on Python 3.2+

Though, actually all that would be required in urllib3 to make it usable.

Lukasa commented 10 years ago

@EnTeQuAk I continue to be in favour of my option 2. =)

brandon-rhodes commented 10 years ago

I agree: option 2 is the winner here, and will bring Requests up to the level of capability of the protocol modules in the Standard Library (http.client, smtplib, poplib, imaplib, ftplib, and nntplib) that all also support SSLContext parameters.

sholsapp commented 10 years ago

+1, option 2.

Lukasa commented 10 years ago

I am going to provide a strawman of option 3, as I believe option 2 is extremely difficult at an engineering level without introducing a new urllib3.SSLContext object.

I propose the addition of a new urllib3 utility class, TLSOptions. That class would work as follows:

opts = TLSOptions()
opts.allow_compression = True
opts.ciphers = 'TLS_ECDHE_RSA_AES_128_GCM_SHA_256'
opts.versions = ['TLSv1.1', 'TLSv1.2']

This states the options in a manner that is deliberately agnostic to whether ssl or PyOpenSSL is being used. It also enables us to specifically whitelist the options we believe it is acceptable to change: for example, we are unlikely to ever have allow_compression.

Finally, the object could perform meaningful validation of settings: for example, if I had passed the cipher string above with opts.versions = ['TLSv1.1'], the object could except (because this is an invalid configuration). Again, we may not implement this, but we could.

Can I get initial feedback on how this feels to people?

alex commented 10 years ago

I'd like to advocate for having TLSOptions take various constructor arguments, and being immutable.

sigmavirus24 commented 10 years ago

I second @alex's suggestion. Constructing it once and not having to use those 4 lines each time I need a slight variation is much nicer. Immutability will also be pleasant so the options cannot be changed under anyone's feet.

Lukasa commented 10 years ago

I'm happy with immutability. That gets us to this:

opts = TLSOptions(
    allow_compression=True,
    ciphers='TLS_ECDHE_RSA_AES_128_GCM_SHA_256',
    versions=['TLSv1.1', 'TLSv1.2']
)
shazow commented 10 years ago

TLSOptions sounds good. Or SSLOptions, else we may need to rename all the other ssl references for consistency. Should this conversation be happening in urllib3-land?

sigmavirus24 commented 10 years ago

@shazow it's lived almost entirely here so let's just finish it here?

sigmavirus24 commented 10 years ago

Are there other options we're considering for this?

tiran commented 9 years ago

I'm strongly in favor for option 2), the SSLContext object. If we can agree upon a minimalistic API then urllib3 and requests would be able to support more TLS/SSL libraries than just OpenSSL through stdlib ssl module and PyOpenSSL. For example I have a specific use case for NSS as TLS library. Somebody has expressed an interest in SecureTransport (OSX) as TLS lib. I even would go so far and standardize a minimal ABC in a PEP.

import socket
from abc import ABCMeta, abstractmethod, abstractproperty

class AbstractSocketDescriptor(metaclass=ABCMeta):
    @abstractmethod
    def fileno(self):
        pass

class AbstractSocket(AbstractSocketDescriptor):
    @property
    def family(self):
        return socket.AF_INET

    @property
    def type(self):
        return socket.SOCK_STREAM

    @abstractproperty
    def proto(self):
        pass

    @abstractmethod
    def close(self):
        pass

    @abstractmethod
    def makefile(self, mode=None, bufsize=None):
        pass

    @abstractmethod
    def recv(self, bufsize, flags=None):
        pass

    @abstractmethod
    def send(self, string, flags=None):
        pass

    @abstractmethod
    def sendall(self, string, flags=None):
        pass

class AbstractSSLSocket(AbstractSocket):
    @abstractmethod
    def get_channel_binding(self, cb_type="tls-unique"):
        pass

    @abstractmethod
    def selected_alpn_protocol(self):
        pass

    @abstractmethod
    def selected_npn_protocol(self):
        pass

class AbstractSSLContext(metaclass=ABCMeta):
    @abstractmethod
    def wrap_socket(self, sock: AbstractSocketDescriptor,
                    server_side=False, *, server_hostname=None
                    ) -> AbstractSSLSocket:
        pass

I think these attributes and methods are about the bare minimum to define a useful socket, SSL connection and SSLContext. The ABCs deliberately omit all methods to create or configure a socket or a context. It only defines methods to read/write for TCP streaming connections over IPv4 or IPv4 as well as an API to wrap a socket into a SSL socket. Basically SSLContext object is demoted to a factory that turns minimal socket into a SSL-aware socket.

For most applications the socket is created and configured elsewhere, usually in a create_connection(). Maybe create_connection() could be a method of SSLContext, too. That would be useful for NSS, because NSPR has its own socket abstraction layer that is not fully compatible with OS sockets.

def create_connection(
        address,
        timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
        source_address=None,
        family=AF.AF_UNSPEC,
        addrflags=0, # getaddrinfo flags
        blocking=None, # setblocking()
        sockopts=(), # setsockopt()
        ) -> AbstractSocketDescriptor:
    pass

My idea doesn't contradict Alex's idea for TLSOptions, it just deals with a different level of abstraction. TLSOptions is a useful high level interface for requests. My proposal doesn't care how a SSLContext object for a specific implementation is created. TLSOptions could be used to create ssl.SSLContext or OpenSSL.SSL.Context. But internally requests and urllib3 should really use SSLContext.

Lukasa commented 9 years ago

We'll obviously use SSLContexts at the interface to urllib3, because that's what urllib3 will accept. However, in requests land, we may want to use an API that's far clearer.

aadamowski commented 8 years ago

Hi, not sure if this thread is the right place, but I just wanted to add my 2 cents from an API user's perspective.

Currently, the requests library is missing any obvious way to access the server certificate to implement additional custom validations.

AFAIU, when we gain a way to supply customized SSLContext instances (which seems to be the prevailing option right now), we'd be able to implement this with some obscure tricks:

  1. override the SSLContext.wrap_socket() method, delegating the call to the super wrap_socket(), then storing the results of getpeercert(True) in some custom attribute on the socket, e.g. sock.ssl_peer_cert_der, and results of getpeercert(False) in e.g. sock.ssl_peer_cert_limitedly_parsed
  2. reach inside the requests's response object, making our way down to the socket through lots of inappropriate intimacy (response.connection.sock I suppose?)
  3. read the desired certificate attribute from the socket, and do what's desired with it (e.g. crypto.load_certificate(crypto.FILETYPE_ASN1, socket.ssl_peer_cert_der)

However, this is obviously suboptimal and involves bad code smells.

If you get to implementing SSLContexts support, could you also expose an obvious way in the API (e.g. a callback function) to handle any additional custom validations with access to full certificate data?

Basically, a way to get called back with the result of SSLSocket.getpeercert(Boolean) and throw some exception if we don't like the cert.

Lukasa commented 8 years ago

@aadamowski You wouldn't need that. Generally speaking, if you can provide an SSLContext you should just use PyOpenSSL and provide Context with a custom verify callback. That would allow you to achieve your goal.

aadamowski commented 8 years ago

@Lukasa , is the plan for requests to handle both Python ssl's SSLContext and PyOpenSSL's Context objects equivalently?

Despite some similarities, these classes don't have compatible APIs and aren't related.

Lukasa commented 8 years ago

Correct, but that's why we would need to have both. Requests can use pyOpenSSL in some circumstances, which means we need to be able to use the pyOpenSSL Context object in some circumstances. This is why this issue has been around so long: it's not as easy as it seems at first glance. ;)

offero commented 8 years ago

I came across this issue today after trying to customize the cipher suite for the SSL connection in a Session. For some reason the TLS version and cipher spec were not being automatically negotiated appropriately. I'm still trying to figure out why this is the case. This is an issue that has come up for us multiple times now and we really do need a solution. I do like using the SSLContext object because it is the standard way to customize the ssl connection in Python. In any case, we just need a documented way to specify these parameters.

Currently, to force a given TLS version, we implement an HTTPAdapter and use session.mount() with it. But the pool manager only allows us to specify a specific ssl_version to use (packages/urllib3/poolmanager.py). It does not allow us to customize the ciphers or anything else.

To provide more context, we have encountered issues using client-side certificates while connecting to Java GlassFish servers that require TLS 1.1/1.2.

dsully commented 7 years ago

+1 - this would be great to have. It's been almost a year with no movement here.

Lukasa commented 7 years ago

No, it hasn't. You can pass SSLContext objects using TransportAdapters as of v2.12.0.

dsully commented 7 years ago

@Lukasa Are there docs on doing so? I'm not finding anything.

Lukasa commented 7 years ago

@dsully Not at this time, though I have written examples a few times.

dsully commented 7 years ago

@Lukasa Thanks, I'll start playing around with that. FWIW, the reason I need this is adding our internal CA to the trust store in addition to locking down the cipher suite.