Open nicois opened 4 months ago
If this proposal succeeds, a follow-up proposal would involve providing a simpler means for users to do this with pgxpool
, instead of having to write their own code to manage the collection of these types, storing them, and then applying them each time a new connection was made.
An additional possibile minor extension to this proposal is to also define LoadAllTypes
which does exactly that. This would be useful when there are only a relatively small number of types (as it won't bitrot as the types used by the database evolve), or where the overheads of loading in all types is outweighed by the benefits. Indeed, if only a single SQL call is needed regardless of how many types are loaded, the overheads are reduced significantly.
I definitely like optimizing LoadType
to only be a single query.
I'm tentatively in favor of a LoadTypes
that also loads dependencies.
I'm tentatively in favor of LoadAllTypes
. A long time ago, back in pgx v2 all types were automatically loaded. That led to problems for people who had hundreds of thousands of types (https://github.com/jackc/pgx/issues/140). But if it is opt-in that should resolve the issue.
LoadTypes
and LoadAllTypes
would need to be smart enough to return types in dependency order so they could be registered successfully.
Also, some sort of caching like you suggested in https://github.com/jackc/pgtype/issues/216 could be really beneficial when there is very large numbers of types, but I am nervous about as there are some nasty edge cases, especially using a HA setup with logical replication -- the OIDs may be different from one server to the next. I'd rather make it so mast that caching is unnecessary.
Ok, thanks for the confirmation that I'm headed in the right direction.
I'll put together some PRs in the coming days to address these.
I am using pgx with a database which defines a very large number of custom types. Virtually every stored procedure returns a custom type, and any given application needs at least 50-100 types, often more. The current
LoadType
connection method is helpful, but is limited:LoadType
generates two queries to the database when one would be sufficient.LoadType
calls concurrently, this adds significant overhead to an application's startup, before the dataclass connection/pool can be used at all.I would like to create a PR to resolve these issues, either in separate PRs or together, if I get the green light. Are these seen as valid concerns, and would improvements in these areas be accepted?
To briefly describe what I am doing locally, and which would form the basis of my PR(s):
QueryRow
calls made inLoadType
is used in the second query)LoadTypes
which takes a[]string
of type names. Modify the (now) single SQL query to use= ANY($1)
to find all OIDs matching the provided type names, and return[]*pgtype.Type
.GetTypeDependencies
which, given a connection and a list of type names, recursively identifies the sub-types they implicitly also need registered. This can either be called internally byLoadType
and the newLoadTypes
or if that is considered too "magicial", can be kept separate for the user to invoke. e.g.conn.LoadTypes(ctx, conn.GetTypeDependencies(ctx, "my_type1", "my_type_2", ...))
The SQL I am using to assist in the above: