jackc / pgx

PostgreSQL driver and toolkit for Go
MIT License
10.88k stars 847 forks source link

Remove explicit simple protocol support #1146

Closed jackc closed 2 years ago

jackc commented 2 years ago

This is a breaking change that would only be done if there is a new major release.

TLDR - It complicates the code, increases maintenance burden, I don't see any use cases where it is absolutely necessary, and I don't personally want to maintain it.

The simple protocol has a number of disadvantages.

Query arguments must be interpolated into the SQL string client. While the pgx client side escaping code has no known bugs it would be nice to remove that security surface entirely.

The simple protocol exclusively uses text encoding. This can mean poorer performance, but it also means that pgx must also support the text format for types even if that would be otherwise unnecessary.

The pgx type system works best when it knows the OID (data type) of query parameters and of results. It can get that with the extended protocol. It can't with the simple protocol. This can lead to differing behaviors for the same query run with the simple and extended protocol. e.g. []int32{1,2,3} will be encoded differently if the type it is being encoded into is int[] or json. But with the simple protocol it doesn't know what it is being encoded into.

This introduces complexity throughout the pgx encoding system as it has to try to support encoding when it doesn't know what it is encoding to. To mitigate this there is a separate type mapping system (ConnInfo.RegisterDefaultPgType()) that runs when type information is unknown.

The advantage of the simple protocol is it is guaranteed to only have a single network round trip and it does it without using prepared statements. This can be important when using an external connection pool that doesn't work with prepared statements.

At the very least I would like to tighten up the type system such that it always has an OID for query parameters. It may still come from something like ConnInfo.RegisterDefaultPgType() but this will guarantee that the rest of the system doesn't have to deal with unknown OIDs.

However, the extended protocol can do single network round trip execution without prepared statements if the query parameter types are known (the protocol doesn't require the types to be known but pgx does to encode them properly). If OIDs are always available from the change described in the previous paragraph, then I don't see any advantage to preserving explicit simple protocol support.

The simple protocol would continue to be used internally by Exec when there are no arguments. This is convenient for executing arbitrary blobs of SQL. pgconn would also retain its lower level support of the simple protocol. This change would just remove the QuerySimpleProtocol and PreferSimpleProtocol options.

If anyone has reasons they need to use the simple protocol, please share them.

brunoluiz commented 2 years ago

Last week I started reading a bit about RDS Proxy, which we have running against some of our databases. The interesting bit is that it doesn't multiplex connections in the extended protocol. It ends up pinning connections, which is not exactly great, as quite quickly all your connections reach this state.

RDS Proxy doesn't multiplex connections when your client application drivers use the PostgreSQL extended query protocol. https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-proxy.html

There is a guide on "How to Avoid Pinning" and it clearly states that the usage of extended protocol or prepared statements will cause pinning: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/rds-proxy-managing.html#rds-proxy-pinning

We ended up discussing the usage of pgx (some of our code use lib/pq), as it currently supports Simple Query Protocol. In our early tests, it solves our connection pinning problem, albeit we are still trying to understand the trade-offs.

So, I am not sure how much of a strong use case it is to keep Simple Query Protocol because of RDS Proxy, but certainly it is one.

jackc commented 2 years ago

Interesting. I know it is possible to use the extended protocol with PGBouncer as long as you don't use prepared statements. I wonder what RDS proxy is doing differently.

brunoluiz commented 2 years ago

Yeah... This product is fairly new (Jun 2020) and Postgres support is still a bit behind (only up to v12 is supported), so perhaps they are still catching up in some other fields. I wonder if extended query protocol is in their 2022 roadmap, but I couldn't find anything about their roadmap.

jackc commented 2 years ago

I've decided to keep simple protocol support. I added a single round trip extended protocol mode. The logic for figuring out how to encode a Go value into a PostgreSQL value is basically the same as for the simple protocol so the extra maintenance burden is minimal. I expect that with this new mode there will be very few cases where the simple protocol is preferable, but it will be available when needed.

brunoluiz commented 2 years ago

I thought another issue with keeping the simple protocol support would be doing lots of client-side parameter sanitisation, which on extended protocol happens on the server. This would mean development type around keeping this working correctly, just for the simple protocol.

jackc commented 2 years ago

Yes, client side validation is an additional cost. However, the code to parse SQL queries and sanitize string literals is already written and tested and it should never need to change.

Don't get me wrong, I still think the legitimate use cases for the simple protocol are very rare. But with the way the code is structured now I don't anticipate any significant maintenance burden so I'm willing to leave it in for those who may need it.