psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
51.99k stars 9.29k forks source link

Determining the IP address of server with bad TLS cert. #4939

Open tmontes opened 5 years ago

tmontes commented 5 years ago

Preliminary notes:

Scenario:

The issue:

Why?

The "give me an IP address" solutions I found all assume the HTTP connection has been established and are mostly based in the idea of using streaming mode to get to the underlying socket, calling getpeername() from there.

Those do not work in this scenario given that an requests.exceptions.SSLError exception is properly raised and there's no response object to work with from that point on. Unless the exception holds a reference to the socket, but I couldn't find it there.

Questions:

Thanks in advance.

lifehackjim commented 5 years ago

I wrote a thing to handle this, sort of: https://github.com/lifehackjim/cert_human/

You can get a cert from a server (regardless of it's validity), then perform whatever validation, reporting, or what-have you on it. Ex:

>>> import cert_human
>>> store = cert_human.CertStore.from_request("https://cyborg")
>>> print(store.subject)
{'common_name': 'cyborg'}
>>> print(store)
CertStore:
    Issuer: Common Name: cyborg
    Subject: Common Name: cyborg
    Subject Alternate Names: cyborg
    Fingerprint SHA1: 67 FD F1 7A 02 26 C7 AB 77 AD CD CB 63 76 19 AD 83 0C BF B7
    Fingerprint SHA256: FA BF 9D EC CF 6C 3F 8A 08 89 29 04 5E 9E B5 A8 28 A9 F7 A8 E8 38 14 7F 32 CE 78 DC 26 B0 84 EA
    Expired: False, Not Valid Before: 2008-11-15 06:32:10+00:00, Not Valid After: 2028-11-15 02:56:10+00:00
    Self Signed: maybe, Self Issued: True
tmontes commented 5 years ago

Thanks for your input.

That doesn't seem to respond to the question "what's the IP address of the server that just failed my TLS certificate validation": the important part here is the that just failed (and, thus, resulted in a requests.exceptions.SSLError exception).

If, facing such failure, the code issues a subsequent request -- be it with requests or with your cert_human -- there's no guarantee that it will hit the same destination IP address.

PS: I do not want to validate TLS certificates in my code. I'd rather delegate that to requests default behaviour. :)

lifehackjim commented 5 years ago

You can do that, by having cert_human always include the cert attributes in the raw object of each response, but you'd have to make two requests per connection. One with verify=False first (either by using cert_human.get_response(), or by using requests.get(verify=False), then your actual connection. Ex:

>>> import requests
>>> import cert_human
>>> cert_human.enable_urllib3_patch()
>>> url = "https://cyborg"
>>>
>>> cert_response = requests.get(url, verify=False)
/Users/jim.olsen/.pyenv/versions/3.7.1/lib/python3.7/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
>>> store = cert_human.CertStore.from_response(cert_response)
>>>
>>> try:
...     r = requests.get("https://cyborg")
... except requests.exceptions.SSLError as exc:
...     m = "SSL Certificate at url: {url!r} failed, info: {store}"
...     print(m.format(url=url, store=store))
...
SSL Certificate at url: 'https://cyborg' failed, info: CertStore:
    Issuer: Common Name: cyborg
    Subject: Common Name: cyborg
    Subject Alternate Names: cyborg
    Fingerprint SHA1: 67 FD F1 7A 02 26 C7 AB 77 AD CD CB 63 76 19 AD 83 0C BF B7
    Fingerprint SHA256: FA BF 9D EC CF 6C 3F 8A 08 89 29 04 5E 9E B5 A8 28 A9 F7 A8 E8 38 14 7F 32 CE 78 DC 26 B0 84 EA
    Expired: False, Not Valid Before: 2008-11-15 06:32:10+00:00, Not Valid After: 2028-11-15 02:56:10+00:00
    Self Signed: maybe, Self Issued: True
tmontes commented 5 years ago

Thanks again Jim, for your prompt feedback.

AFAICT, your code does not address the issue at all. Let me try to restate it:

Minimal code example with a "fill in the blanks" approach:

import requests

try:
    # TCP connection to one of multiple IPs that DNS resolves `multiple.example.net` to.
    resp = requests.get('https://multiple.example.net')
except requests.exceptions.SSLError:
    # TLS certificate validation failed.
    ip_address = ???    # Which IP address gave us a non-valid TLS certificate?

PS: Not sure if the underlying connection pooling and eventual retrying that may be taking place (?) turns this into a more complex problem that what it may appear to be at first sight.

lifehackjim commented 5 years ago

Ah I understand now.. I didn't catch the part that you were making a request to a DNS name with multiple A records. Apologies.

I don't know that any layer exposes the actual IP address that the socket is connected to (or it's just buried too deep for my quick search). But if you can find that layer, it looks like you'd have to monkey patch and bubble it up (similar to what I do with cert_human).

sethmlarson commented 5 years ago

Could you patch urllib3.util.connection.create_connection() to print out / save the DNS records that socket.getaddrinfo() receives somewhere you can access? A little hacky but this is where you'd directly get DNS-to-IP information.

lifehackjim commented 5 years ago

I decided to play around with this, because curiosity always gets the best of me.

import requests
import urllib3
import ssl

_ssl_wrap_socket = urllib3.connection.ssl_wrap_socket

def ssl_wrap_socket(sock, keyfile=None, certfile=None, cert_reqs=None,
                    ca_certs=None, server_hostname=None,
                    ssl_version=None, ciphers=None, ssl_context=None,
                    ca_cert_dir=None):
    """Pass."""
    try:
        return _ssl_wrap_socket(
            sock=sock,
            keyfile=keyfile,
            certfile=certfile,
            cert_reqs=cert_reqs,
            ca_certs=ca_certs,
            server_hostname=server_hostname,
            ssl_version=ssl_version,
            ciphers=ciphers,
            ssl_context=ssl_context,
            ca_cert_dir=ca_cert_dir,
        )
    except ssl.SSLError as e:
        e.laddr = sock.getsockname()
        e.raddr = sock.getpeername()
        raise

urllib3.connection.ssl_wrap_socket = ssl_wrap_socket

url = "https://cyborg"

try:
    r = requests.get(url)
except requests.exceptions.SSLError as exc:
    print("Invalid cert at {!r}".format(url))
    print("Local ip {} port {}".format(*exc.args[0].reason.args[0].laddr))
    print("Remote ip {} port {}".format(*exc.args[0].reason.args[0].raddr))

This outputs:

python moo.py
Invalid cert at 'https://cyborg'
Local ip 192.168.1.174 port 53151
Remote ip 192.168.1.32 port 443
tmontes commented 5 years ago

Seth, Jim,

Thanks for your ideas. They both follow monkey-patching approaches which I am explicitly trying to avoid: who's to say that, in the future, a given patch will work against a future requests / urllib3 version?

I myself had created a monkey-patch based solution, patching socket.socket.connect:

import socket
import requests

_socket_connect_method = socket.socket.connect

def _socket_connect_tracker(self, address):
    _socket_connect_tracker.address = address
    return _socket_connect_method(self, address)

socket.socket.connect = _socket_connect_tracker

try:
    resp = requests.get('https://multiple.example.net')
except requests.exceptions.SSLError:
    print('TLS validation failed for', _socket_connect_tracker.address)

Pros:

Cons:

What I was wondering and asking about in this issue was: