tlocke / pg8000

A Pure-Python PostgreSQL Driver
BSD 3-Clause "New" or "Revised" License
515 stars 46 forks source link

feat: allow connection with pre-configured socket #136

Closed jackwotherspoon closed 1 year ago

jackwotherspoon commented 1 year ago

Most database libraries across different programming languages allow for connections to be created from a configured SSL/TLS socket.

This feature PR allows this to be done for pg8000 and mirrors the approach used by pymysql of setting sock argument.

jackwotherspoon commented 1 year ago

@tlocke let me know what you think of this and the best approach for adding tests 😄

tlocke commented 1 year ago

Hi @jackwotherspoon, just thinking about this now. My first inclination is that it's a good idea and we should do it, just pondering on how all the different options for creating a connection can work coherently together, request_ssl etc.

tlocke commented 1 year ago

Looking more closely, there may be a problem with providing a pre-configured SSL socket, because according to the PostgreSQL protocol we first need to request SSL over an unencrypted socket, and then create an SSL socket. See the protocol docs:

https://www.postgresql.org/docs/current/protocol-flow.html#id-1.10.6.7.12

Can you see a way around that?

jackwotherspoon commented 1 year ago

@tlocke I believe if a user passes in their own pre-configured SSL socket it is expected that we can skip the startTLS protocol that you are referring to (similar to what the request_ssl attribute does).

A popular Go Postgres driver pgx does this where it skips startTLS if a pre-configured Dialer (which uses an already existing SSL/TLS socket) is passed in and only performs startTLS when a TLS config (equivalent to ssl_context is passed in. https://github.com/jackc/pgx/blob/ef9e26a5d5a6ae7f6f8dd8570e181c08f114de9c/pgconn/pgconn.go#L286

If a user wants startTLS (probably the most common path) they can continue passing in ssl_context as an argument as the documentation outlines.

For full TLS/SSL control I would expect a pre-configured SSL socket to be allowed to be passed, where the user take cares of the SSL handling upon socket creation, outside of the driver code.

I can dig up a Java driver that does this sort of connection path as well if need be.

Let me know what you think of this 😄

tlocke commented 1 year ago

So I thought I'd write some standalone scripts to see how creating the pre-configured SSL socket would work. This first one uses the StartSSL message, which we know works:

import socket
import ssl
import struct

with socket.create_connection(("localhost", 5432)) as sock:
    sock.sendall(struct.pack("!ii", 8, 80877103))
    sock.recv(1)

    ssl_context = ssl.create_default_context()
    ssl_context.check_hostname = False
    ssl_context.verify_mode = ssl.CERT_NONE

    with ssl_context.wrap_socket(sock) as ssock:
        print(ssock.version())

which prints out SSL version as expected. This second one doesn't do the StartSSL step:

import socket
import ssl

with socket.create_connection(("localhost", 5432)) as sock:

    ssl_context = ssl.create_default_context()
    ssl_context.check_hostname = False
    ssl_context.verify_mode = ssl.CERT_NONE

    with ssl_context.wrap_socket(sock) as ssock:
        print(ssock.version())

which gives the error:

Traceback (most recent call last):
  File ".../test.py", line 16, in <module>
    with ssl_context.wrap_socket(sock) as ssock:
  File "/usr/lib/python3.10/ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.10/ssl.py", line 1071, in _create
    self.do_handshake()
  File "/usr/lib/python3.10/ssl.py", line 1342, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:997)

So it seems that if we use an SSL socket without sending the StartSSL message beforehand, the server doesn't like it?

jackwotherspoon commented 1 year ago

@tlocke

I misframed this feature slightly -- I'd like to add support for creating a postgres connection over an existing socket like object. In this approach startTLS can or cannot be used depending on the caller. This lines up similarly with what we have in Java (socketFactory connection param, driver code), Go (pgx DialFunc) (these two are slightly different as they allow specifying a creator/generator func that creates the socket, while here we are just passing in the socket directly but the extent of the feat is the same), and other Python libraries (pymysql)

The equivalent pymysql PR that introduced this change explains the feature really well https://github.com/PyMySQL/PyMySQL/pull/355

The benefit of this feature is that it allows the user to specify their own secure tunnel to connect over (such as ssh).

tlocke commented 1 year ago

Aha, I think I understand now! So, I've added some docs and tests and changed your patch slightly and I've put it on a test branch https://github.com/tlocke/pg8000/tree/test . Let me know if you think it's okay and I'll push it to main.

jackwotherspoon commented 1 year ago

@tlocke Tested the branch for our use-case and it works great!

Thanks so much 😄

tlocke commented 1 year ago

Great, that's been released now, in version 1.3.0.

jackwotherspoon commented 1 year ago

@tlocke Thanks so much! Just curious when will version 1.3.0 be available in pypi?

Thanks again - Jack

tlocke commented 1 year ago

It should be there now as version 1.30.1.

The problem was that the upload of 1.30.0 failed because the README.rst wasn't a valid RST file. I've added in a check for that in the automated tests so we should catch that in future. Anyway, I fixed it and retried with the same version number, and it seemed to work, but in fact it hadn't uploaded properly. So I did a new release with a new version number and that seems to have worked.