python / cpython

The Python programming language
https://www.python.org
Other
63.3k stars 30.3k forks source link

Make SSL suppress_ragged_eofs default more secure #72002

Open vadmium opened 8 years ago

vadmium commented 8 years ago
BPO 27815
Nosy @vstinner, @tiran, @alex, @njsmith, @jab, @vadmium, @csabella
PRs
  • python/cpython#5266
  • Files
  • truncating_proxy.py
  • ragged-eofs.patch
  • ragged-eofs.v2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-security', 'expert-SSL', '3.7', 'library'] title = 'Make SSL suppress_ragged_eofs default more secure' updated_at = user = 'https://github.com/vadmium' ``` bugs.python.org fields: ```python activity = actor = 'jab' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)', 'SSL'] creation = creator = 'martin.panter' dependencies = [] files = ['44169', '44784', '45106'] hgrepos = [] issue_num = 27815 keywords = ['patch'] message_count = 17.0 messages = ['273208', '277212', '278737', '301711', '301751', '301752', '301996', '310389', '310390', '310403', '310407', '310410', '310413', '310415', '310418', '310419', '310420'] nosy_count = 7.0 nosy_names = ['vstinner', 'christian.heimes', 'alex', 'njs', 'jab', 'martin.panter', 'cheryl.sabella'] pr_nums = ['5266'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'security' url = 'https://bugs.python.org/issue27815' versions = ['Python 3.7'] ```

    vadmium commented 8 years ago

    In the SSL module, the wrap_socket() function (and corresponding SSLContext method) take a flag called “suppress_ragged_eofs”. It defaults to True, which makes me uncomfortable. The documentation says:

    ''' The parameter “suppress_ragged_eofs” specifies how the SSLSocket.recv() method should signal unexpected EOF from the other end of the connection. If specified as True (the default), it returns a normal EOF (an empty bytes object) in response to unexpected EOF errors raised from the underlying socket; if False, it will raise the exceptions back to the caller. '''

    I understand the “unexpected EOF error” happens when the underlying socket indicates EOF, but the connection was not shut down at the SSL protocol level. As well as EOF from the other end of the connection, it can happen due to a proxy, or anything else on the network that can affect the connection. I think it may be better report this error by default, just like other unsecured network-level errors like connection reset or timeout. Otherwise it is too easy to treat this insecure EOF condition as if it were a secure EOF signal of the remote peer.

    The flag was added as part of r64578, for bpo-1223, in Python 2.6. The reason given in that bug report was to help Python work with a HTTP server. However my theory is the server was closing the connection wrongly; if so, the server should have been fixed, rather than Python.

    There is plenty of precedence for using suppress_ragged_eofs=True with HTTPS servers. As well as bpo-1223, there is bpo-494762 and bpo-500311. And in my experiments, Curl and Firefox both seem to treat the error the same as a secure EOF. Maybe there should be a way to keep supporting HTTPS servers that trigger the error (though I would rather not by default).

    Attached is a HTTP proxy server that will let you break a connection after returning a set number of bytes (or any time by killing the server), which can trigger the error condition.

    Example output of proxy server: $ python truncating_proxy.py --limit 12345 Proxy server at http://127.0.0.1:46687 Proxying connection to www.python.org:443 Forwarded 12345 B to client

    Python 2’s httplib module does not treat a short HTTP response as an error, so the following request gets truncated without much indication of a problem: $ https_proxy=http://127.0.0.1:46687 python2 -c 'import urllib2; print(urllib2.urlopen("https://www.python.org/").read())' \<!doctype html> [. . .] \<!--[if lt IE 8]> \<div id="oldie-warning" class

    vadmium commented 8 years ago

    I have been experimenting with a patch that changes the default to suppress_ragged_eofs=False.

    One disadvantage of this change is it could make servers less robust. E.g. in the tests, I explicitly enabled suppress_ragged_eofs=True in a server, because otherwise I would have to add extra cleanup if an exception is raised on the server side. In another test server, based on socketserver, I added special code to silence logging an SSLEOFError exception (a side effect of a particular test case).

    But ideally, real-world servers should already handle other exceptions that are triggered outside the server’s control like, ECONNRESET and TLSV1_ALERT_UNKNOWN_CA. Socketserver already logs all these kinds of errors. Plus I think SSLEOFError has always been possible in the handshake phase.

    With HTTP I didn’t have to go further than Google to find a server that terminates the response with a non-SSL shutdown (although because truncated JSON is invalid, it is not a big deal). For the record, here is a stripped-down demo. The same problem also happens for a more complete request with valid parameters.

    >>> import socket, ssl
    >>> s = socket.create_connection(("accounts.google.com", 443))
    >>> ss = ssl.wrap_socket(s, suppress_ragged_eofs=False)
    >>> ss.sendall(b"POST /o/oauth2/token HTTP/1.1\r\n"
    ...     b"Host: accounts.google.com\r\n"
    ...     b"Content-Length: 0\r\n"
    ...     b"Connection: close\r\n"
    ...     b"\r\n")
    >>> print("\n".join(map(repr, ss.recv(3000).splitlines(keepends=True))))
    b'HTTP/1.1 400 Bad Request\r\n'
    b'Content-Type: application/json; charset=utf-8\r\n'
    b'Cache-Control: no-cache, no-store, max-age=0, must-revalidate\r\n'
    b'Pragma: no-cache\r\n'
    b'Expires: Mon, 01 Jan 1990 00:00:00 GMT\r\n'
    b'Date: Thu, 22 Sep 2016 03:10:20 GMT\r\n'
    b'X-Content-Type-Options: nosniff\r\n'
    b'X-Frame-Options: SAMEORIGIN\r\n'
    b'X-XSS-Protection: 1; mode=block\r\n'
    b'Server: GSE\r\n'
    b'Alt-Svc: quic=":443"; ma=2592000; v="36,35,34,33,32"\r\n'
    b'Accept-Ranges: none\r\n'
    b'Vary: Accept-Encoding\r\n'
    b'Connection: close\r\n'
    b'\r\n'
    b'{\n'
    b'  "error" : "invalid_request",\n'
    b'  "error_description" : "Required parameter is missing: grant_type"\n'
    b'}'
    >>> ss.recv(3000)  # HTTP-level client does not know that this is the EOF yet
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/proj/python/cpython/Lib/ssl.py", line 987, in recv
        return self.read(buflen)
      File "/home/proj/python/cpython/Lib/ssl.py", line 865, in read
        return self._sslobj.read(len, buffer)
      File "/home/proj/python/cpython/Lib/ssl.py", line 627, in read
        v = self._sslobj.read(len)
    ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2176)

    In this case, if the client does not send “Connection: close”, the server uses chunked encoding and there is no problem. So this is another instance of bpo-12849 (Python’s unusual request triggering a server bug).

    I wonder if a solution would be to use suppress_ragged_eofs=False by default, but add a way to let the user of the http.client module explicitly allow SSLEOFError to signal a proper EOF.

    vadmium commented 8 years ago

    Patch v2 also adds a new attribute to context objects. With this I can work around my Google server bug:

    context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH)
    context.suppress_ragged_eofs = True
    handler = urllib.request.HTTPSHandler(context=context)
    urlopen = urllib.request.build_opener(handler).open
    urlopen(Request(url="https://accounts.google.com/o/oauth2/token", ...))
    tiran commented 7 years ago

    I don't consider myself qualified enough to make a decision. Alex, Victor, what do you think?

    vstinner commented 7 years ago

    Martin: can you please create a pull request? It would be easier to review your change.

    alex commented 7 years ago

    Mmmm, my understanding is that ignoring TCP-FIN/RST-without-TLS-closenotify is pretty common for a lot of different clients.

    We should probably survey the landscape, see what both browsers and non-browse clients (e.g. curl) do before making a decision.

    vadmium commented 7 years ago

    Even if some use cases depend on suppress_ragged_eofs=True, I think it is best to avoid that as the default. There could be a deprecation period if necessary.

    I tested some HTTP clients I had handy. In summary, most seemed to handle a truncation attack on the Set-Cookie field sensibly (but Python is vulnerable without my patch or other solution). On the other hand, all the clients I tried handled one case of an insecurely-truncated body the same as a normal EOF (while I propose to treat this as an error).

    The clients I tested: Firefox/46.0, curl/7.43.0, Wget/1.13.4, Links 2.12, ELinks/0.13.GIT, Python-urllib/3.5 (unpatched), and Python-urllib/3.7 with my patch. I tried three test cases:

    1. Truncate Set-Cookie field, with no terminating newline. The client should not accept the cookie, in case an attribute such as “Secure” was removed, like in <https://bugs.chromium.org/p/chromium/issues/detail?id=244260>.
    >>> c.sendall(b"Set-Cookie: COOKIE=meant-to-be-Secure")
    >>> c.shutdown(SHUT_RDWR)

    Python (unpatched) treats the Set-Cookie field as valid. It appears in the HTTPResponse object, with no clue that it was truncated. Wget was also vulnerable. Firefox and Curl did not record the cookie, but did not indicate any error either. Links does not support cookies, while Elinks tried 3 times and reported an error.

    2. Content-Length response with truncated text/html. IMO a client should inform the user that the response was cut off (with or without SSL), but sometimes the user may want to see the first half of the response anyway.
    >>> c.sendall(b"Content-Length: 100\r\n\r\n" b"Truncat")
    >>> c.shutdown(SHUT_RDWR)

    Curl, wget, Links and Elinks all outputted the incomplete response, and reported an error. Firefox displayed the truncated page without indicating any error. In most cases, Python raised an IncompleteRead exception, but it depended on the pattern of read calls, and some or all of the truncated data was hidden in the undocumented “IncompleteRead.partial” attribute.

    3. “Connection: close” response with truncated HTML:
    >>> c.sendall(b"Connection: close\r\n\r\n" b"Truncat")
    >>> c.shutdown(SHUT_RDWR)

    This is the case where all the clients (other than my patched Python) treated this like a valid non-truncated response. But IMO in general it should be dealt with like the Content-Length case if the SSL level wasn’t shut down properly.

    Victor: Sorry, I’m unlikely to make a Git Hub pull request any time soon, but I don’t mind if someone else does.

    csabella commented 6 years ago

    I converted the patch to a PR.

    It wouldn't merge which means I did it manually, so please check it for errors. Some issues I ran into:

    1. The patch had a change to __slots__, but that line no longer existed and I didn't know if I needed to make a different change to accommodate it.
    2. I converted the 'What's New' to a blurb.
    njsmith commented 6 years ago

    The current default is hard to defend -- it's a clear violation of the TLS specs. But I suspect that changing it will be pretty disruptive, because suppress_ragged_eof=True is the de facto standard for HTTP-over-TLS (generally justified on the grounds that HTTP has its own framing, so the TLS-level close-notify stuff is redundant), and that means that if we flip it cases that used to work correctly will start raising errors. We could avoid some of these issues by making http.client explicitly use suppress_ragged_eof=True, but then that won't fix Martin's problems...

    A discussion about changing it should also consider what to do on sslsock.close(), since right now Python always gives the other side a "ragged eof" unless you call "sslsock.unwrap()", and I've never seen any code that does this. So flipping the suppress_ragged_eof default by itself will make Python's ssl module fail at interoperating with itself.

    1. Truncate Set-Cookie field, with no terminating newline. [...] Python (unpatched) treats the Set-Cookie field as valid. It appears in the HTTPResponse object, with no clue that it was truncated.

    This sounds like a bug in http.client – the lack of terminating newline should cause a parse failure regardless of what the SSL layer does (and regardless of whether we're even using SSL).

    1. Content-Length response with truncated text/html. [...] In most cases, Python raised an IncompleteRead exception, but it depended on the pattern of read calls

    Also a bug in http.client.

    1. “Connection: close” response with truncated HTML:

    This one's a little more interesting... First, "Connection: close" doesn't affect the framing of the HTTP body, it just says that this connection can't be reused for another request afterwards. The way HTTP works, if you have a response with neither a Content-Length header *nor* a Transfer-Encoding: chunked header, then it's assumed that the framing is determined by the connection being closed -- I assume that this is what you generated here. Now, this is a hack for backwards compatibility with HTTP/1.0; in theory, no modern server should ever generate such a response. In practice, who knows, people do all kinds of nonsense.

    But... do we really think that servers who can't be bothered to implement HTTP/1.1, are also going to take the trouble to make sure their SSL/TLS handling is correct to the spec? If the server decides to use connection-close framing, that's their decision, and the client is kind of at their mercy.

    tiran commented 6 years ago

    Before we can disable ragged EOF handling, we first have to fix ssl.SSLSocket()'s shutdown behavior. It should support at least a proper unidirectional shutdown besides the slow bidirectional shutdown. It might even be a good idea to default to unidirectional shutdown. Curl does it, https://github.com/curl/curl/blob/9e4ad1e2af22f00eeca533b745b67956f57319cb/lib/vtls/openssl.c#L1155-L1168

    I wrote this in a mail to Cory about PEP-543 two weeks ago:

    Now to the nasty session and shutdown issue. I discovered it just a couple of weeks ago.

    We have to revisit TLS socket shutdown in order to make TLS session resumption work. OpenSSL discards and invalidates sessions when a SSL socket isn't properly shut down, e.g. hard-close I/O layer. Simply speaking OpenSSL has four shutdown modes: bidirectional, unidirection, quiet, and hard. Python's ssl module supports slow bidirectional mode and hard mode.

    https://wiki.openssl.org/index.php/Manual:SSL_CTX_set_quiet_shutdown(3) https://wiki.openssl.org/index.php/Manual:SSL_shutdown(3)

    We need an API to perform some sort of shutdown on __exit__(None, None, None) / close().

    njsmith commented 6 years ago

    @Christian: I'm surprised and somewhat dismayed to hear that OpenSSL invalidates sessions on hard close -- that hasn't been part of the spec since 2006 when TLS 1.1 came out. I'm not a cryptographer, but the spec explicitly allows keeping the session, and I can't think of any particular reason why a network closure should imply that the secret material associated with the session has been compromised.

    FWIW trio currently implements bidirectional mode (await ssl_stream.unwrap()), unidirectional mode (default for await ssl_stream.aclose()), and what-I-thought-was-quiet-but-apparently-is-hard (await ssl_stream.aclose() if the stream has the https_compatible=True flag set, or if a call to aclose() times out) [1]. I guess I should make that last one soft [2]. You actually can do all these things with the current ssl module, but it's extremely tricky and confusing. In particular, the way you do 'unidirectional' is to put the socket in non-blocking mode or use SSLObject and then call unwrap(), and when you get SSLWantReadError you're done, you can close the socket.

    You might also by amused by this comment I left Cory in June and only just managed to track down again: https://github.com/python-hyper/pep543/issues/2#issuecomment-308900931 (and the rest of the thread too, but that comment specifically gets into shutdown semantics)

    [1] https://github.com/python-trio/trio/blob/317020537ecefa9d6c6214c3caf4011ca4cfb564/trio/_ssl.py#L708-L791 [2] https://github.com/python-trio/trio/issues/415

    tiran commented 6 years ago

    You have to tell OpenSSL that a hard-close is expected and fine. If you just SSL_free() the SSL connection, then OpenSSL removes the session from the SSL_CTX's session cache. It assumes that something went wrong and defaults to secure defaults. "Wrong" doesn't necessarily mean that an attacker has compromised a connection.

    In order to flag a session as valid, you have to SSL_set_quiet_shutdown(ssl, 1), SSL_shutdown(ssl), SSL_free(ss). With quiet shutdown set, SSL_shutdown() neither sends nor waits for any data.

    One-way shutdown with non-blocking trick is evil. Cool hack :)

    SSLSocket and SSLObject should really support different shutdown modes, e.g. s.shutdown(mode=0) for quiet, mode=1 for unidirectional and mode=2 for bidirectional.

    njsmith commented 6 years ago

    It doesn't help that Python spells SSL_shutdown as "unwrap".

    I think in practice basically the only two things you want are unidirectional shutdown, or "soft" shutdown. Bidirectional shutdown has some extremely theoretical uses, and we have to provide it anyway because it's in the API already. I don't know of any reason to ever use "hard" shutdown, though I'd be interested to hear of them.

    vstinner commented 6 years ago

    I don't know well, Cheryl's PR wasn't added to this issue: https://github.com/python/cpython/pull/5266

    tiran commented 6 years ago

    Perhaps a hard close is the right thing when SSLSocket.__exit__ encounters an exception?

    njsmith commented 6 years ago

    Why would an exception inside 'with ssl_sock' have anything to do with the validity of the session shared secret?

    I mean, maybe it does, but I *really* don't think we should be waving our hands and guessing about this stuff.

    tiran commented 6 years ago

    For example an invalid host name should invalidate the session until bpo-31399 is resolved. Any TLS protocol violation should also invalidate the session. If somebody messes with the connection or the TLS protocol encounters a problem during MAC validation, the connection must be considered as tainted.

    Some exception may be fine. IMO it's still safer hard-close the connection on any exceptions.

    I agree with you. Let's not guess and ask some experts. I'm having meetings with security engineers from GnuTLS and NSS next week. I'll ask them.