Closed sebastian-correa closed 7 months ago
Hi @sebastian-correa, glad you're enjoying pg8000! With the PostgreSQL protocol you can either set the types of parameters explicitly or you can just let the server infer the parameter types. If you want to set them explicitly then you use the setinputsizes
method. If you set them explicitly the PostgreSQL protocol has a limit on the maximum number of parameters, which is what you're running into. SQLAlchemy always uses setinputsizes
so you're always going to run into this limit with SQLAlchemy.
So one option would be just to use pg8000 directly and not call setinputsizes
. Another option is to rewrite the SQLAlchemy dialect for pg8000 to not call setinputsizes
. I think SQLAlchemy currently does it this way for historical reasons, and I'm sure altering the dialect would be doable and there wouldn't be too much of a problem with backward compatibility, it might even fix some existing bugs with the dialect!
Hi @tlocke. So if I understand this correctly, there's nothing that can be done from thepg8000
side? Do you know how psychopg
manages this?
To your dialect rewrite suggestion, is this something I could easily do myself and patch it in to SQLAlchemy? If so, it might be worth a shot and, maybe, the implementation could become part of pg8000
so that users that need it can manually patch it in (?). I don't know the internals of any of the 2 libraries, but I'm happy to look into it if you give me some pointers :).
Actually ignore what I said about avoiding setinputsizes
, even if you did avoid it there would still be a problem, apologies! So I think the only way around it would be to send the query as a simple string rather then sending a string with placeholders and parameters separately. I think psycopg just sends a simple string. So here is an example that reproduces the problem:
import pg8000.dbapi
conn = pg8000.dbapi.connect(user="postgres", password="cpsnow")
cursor = conn.cursor()
SIZE = 100000
cursor.execute(
f"SELECT 1 WHERE 1 IN ({','.join(['%s'] * SIZE)})",
[1] * SIZE,
)
conn.close()
It'll give the error:
Traceback (most recent call last):
File "/home/tlocke/pg8000/example.py", line 6, in <module>
cursor.execute(
File "/home/tlocke/pg8000/pg8000/dbapi.py", line 471, in execute
self._context = self._c.execute_unnamed(
File "/home/tlocke/pg8000/pg8000/core.py", line 701, in execute_unnamed
self.send_BIND(NULL_BYTE, params)
File "/home/tlocke/pg8000/pg8000/core.py", line 765, in send_BIND
NULL_BYTE + statement_name_bin + h_pack(0) + h_pack(len(params))
struct.error: 'h' format requires -32768 <= number <= 32767
But if you create a simple string it will work:
import pg8000.dbapi
from pg8000.native import literal
conn = pg8000.dbapi.connect(user="postgres", password="cpsnow")
cursor = conn.cursor()
SIZE = 100000
cursor.execute(f"SELECT 1 WHERE 1 IN ({','.join([literal(1)] * SIZE)})")
conn.close()
While it's true this would work and could be done from a "centralized" place (we have a query
method where we could compile the SELECT
query with the params into a string), it'd entail extra effort to correctly stringify "weird types" like UUIDs and such.
SQLAlchemy also seems to recommend against this [ref]:
The reason SQLAlchemy does not support full stringification of all datatypes is threefold:
This is a functionality that is already supported by the DBAPI in use when the DBAPI is used normally. (...)
Stringifying with bound parameters inlined for specific databases suggests a usage that is actually passing these fully stringified statements onto the database for execution. This is unnecessary and insecure, and SQLAlchemy does not want to encourage this use in any way.
The area of rendering literal values is the most likely area for security issues to be reported. SQLAlchemy tries to keep the area of safe parameter stringification an issue for the DBAPI drivers as much as possible where the specifics for each DBAPI can be handled appropriately and securely.
I'm not sure whether using Select.compile(engine)
would solve these security concerns or not, though.
As extra comment, we asked about this in SQLAlchemy a while back as well, but they didn't have any particularly good solution either [ref].
What are your thoughts? Should we stringify everything anyways or batch the query somehow or what?
I've just found a bug in pg8000, and fixing it means that the number of parameters you can send is now doubled. It's in version 1.30.5, maybe that solves the problem for now?
@tlocke, thanks for the work! In practice, this helps a lot as we won't hit the limit as much (almost never during production usage with the new limit). However, this is a critical query and we can't risk having our application stop/fail due to a parameter limit issue, so we still have to do something about this on our end.
I think we'll end up migrating our solution away from cloud-sql-python-connector
to running the proxy manually and calling it a day. This should help in our use-case, though it doesn't technically handle the limit with pg8000
:(.
I'd rather not do this, as the Python connector + pg8000
solution is much easier to use, but given that we can't solve this in a sure-fire way with the combo we have to look into the proxy :(.
One way of doing it is to use a single parameter that's an array, and then use the unnest()
function:
import pg8000.dbapi
conn = pg8000.dbapi.connect(user="postgres", password="cpsnow")
cursor = conn.cursor()
cursor.execute(
"SELECT 1 WHERE 1 IN (SELECT unnest(CAST(%s AS int[])))",
[[1] * 100000],
)
conn.close()
I've added an example to the documentation https://github.com/tlocke/pg8000/tree/main?tab=readme-ov-file#parameter-limit
That's very clever. I'll give it a shot!
Worked! Thanks!
Hi! First off, thanks for PG8000. We've been using it in production to connect to CloudSQL instances and it's simplified our setup significantly.
We have many queries that do
SELECT
s from tables that have aWHERE columna_a IN (...)
filter where the list of valid values can be very long (the max is very rare but is ~50k items). I know it's not the best, so we'd be open to redesigning the queries altogether, but we don't see any viable alternatives at the moment.I know the error is actually because Postgres has a limit in the number of params in each query, but this is handled transparently by
psycopg
+SQLAlchemy
(which we use for local development) but not bypg8000
+SQLAlchemy
. What's the best way to handle this query withpg8000
?These are some alternatives that we'd rather avoid because they complicate our code quite a lot:
COPY FROM
.Sorry if this is really an issue with how
SQLAlchemy
behaves when usingpg8000
! Please let me know if that's the case so we can open a similar question to them.Versions: