tlocke / pg8000

A Pure-Python PostgreSQL Driver
BSD 3-Clause "New" or "Revised" License
515 stars 46 forks source link

Bulk Inserts failing due to OID list length limit #84

Closed vp-regular closed 3 years ago

vp-regular commented 3 years ago

Hi there,

Attempting to run a bulk insert (5000~ or so rows if that matters) in one transaction using pg8000 can lead to a exception due to the following snippet.

https://github.com/tlocke/pg8000/blob/7da25ff4ba68e20992501c13dea638031f32c7ca/pg8000/core.py#L597-L603

In my case, len(oids) was roughly 70000~ per bulk insert easily bypassing the limit of a signed short. Running multiple smaller transactions fixes the issue.

Is this by design? Is there any reason this length can't be an uint or long, or is running smaller transactions generally best practice anyway?

Thanks!

tlocke commented 3 years ago

Hi @vp-cshow, looking at https://www.postgresql.org/docs/current/protocol-message-formats.html and looking at the message Parse:

Parse (F)

Byte1('P') Identifies the message as a Parse command.

Int32 Length of message contents in bytes, including self.

String The name of the destination prepared statement (an empty string selects the unnamed prepared statement).

String The query string to be parsed.

Int16 The number of parameter data types specified (can be zero). Note that this is not an indication of the number of parameters that might appear in the query string, only the number that the frontend wants to prespecify types for.

Then, for each parameter, there is the following:

Int32

Specifies the object ID of the parameter data type. Placing a zero here is equivalent to leaving the type unspecified.

In the code, the line val.extend(h_pack(len(oids))) corresponds to the int16 above so that's the reason it fails.

So, what to do about it. I think we're stuck with PostgreSQL protocol, so there's not much we can do at that level. Are you able to post an example of your code? Then it would be easier to make suggestions.

vp-regular commented 3 years ago

Ah, if it's part of the PSQL protocol I suppose nothing can be done. I'll do some more digging on my own and try to create a more minimal example where this issue can happen.

I will close the issue in the meantime, and reopen if I find anything fruitful. Thank you for your response!

tlocke commented 3 years ago

Actually I've been turning this over in my mind. The Postgres protocol is very flexible, and it turns out that you can omit the oids in the Parse step. If you do that, Postgres will determine the types of the parameters from the SQL query. It also means that we won't have that limit on the length of the oid list. Of course, you can still supply the oid list if you want to. I liked that idea so I've gone ahead and released a new version. Hope people like it :-)