jackc / pgx

PostgreSQL driver and toolkit for Go
MIT License
10.41k stars 825 forks source link

Provide improved support for loading types #2030

Open nicois opened 4 months ago

nicois commented 4 months ago

I am using pgx with a database which defines a very large number of custom types. Virtually every stored procedure returns a custom type, and any given application needs at least 50-100 types, often more. The current LoadType connection method is helpful, but is limited:

I would like to create a PR to resolve these issues, either in separate PRs or together, if I get the green light. Are these seen as valid concerns, and would improvements in these areas be accepted?

To briefly describe what I am doing locally, and which would form the basis of my PR(s):

The SQL I am using to assist in the above:

WITH RECURSIVE typeinfo AS (
    SELECT c.relname AS parent, regexp_replace(pg_catalog.format_type(a.atttypid, a.atttypmod), '\[\]$', '') as child, 'f'::bool AS has_array
    FROM pg_catalog.pg_class c
        LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
        JOIN pg_catalog.pg_attribute a ON (a.attrelid = c.oid)
        JOIN pg_type ON (pg_type.typname = regexp_replace(pg_catalog.format_type(a.atttypid, a.atttypmod), '\[\]$', ''))
    WHERE pg_catalog.pg_table_is_visible(c.oid)
      AND a.attnum > 0 AND NOT a.attisdropped
    UNION ALL
    SELECT typname AS parent, regexp_replace(pg_catalog.format_type(typbasetype, typtypmod) , '\[\]$', '') AS child, (pg_type.typarray > 0)::bool AS has_array
    FROM pg_type
),
relationships(parent, child, has_array, depth) AS (
    SELECT typeinfo.parent, typeinfo.child, typeinfo.has_array, 0
    FROM typeinfo
    WHERE parent = ANY($1)
UNION ALL
    SELECT typeinfo.parent, typeinfo.child, typeinfo.has_array, relationships.depth + 1
    FROM typeinfo
    JOIN relationships ON (typeinfo.parent = relationships.child)
)
SELECT child, has_array
    FROM relationships
    INNER JOIN pg_type ON (pg_type.typname = relationships.child)
    WHERE typtype != 'b'
    GROUP BY child, has_array
    ORDER BY max(depth) DESC, child;
nicois commented 4 months ago

If this proposal succeeds, a follow-up proposal would involve providing a simpler means for users to do this with pgxpool, instead of having to write their own code to manage the collection of these types, storing them, and then applying them each time a new connection was made.

nicois commented 4 months ago

An additional possibile minor extension to this proposal is to also define LoadAllTypes which does exactly that. This would be useful when there are only a relatively small number of types (as it won't bitrot as the types used by the database evolve), or where the overheads of loading in all types is outweighed by the benefits. Indeed, if only a single SQL call is needed regardless of how many types are loaded, the overheads are reduced significantly.

jackc commented 4 months ago

I definitely like optimizing LoadType to only be a single query.

I'm tentatively in favor of a LoadTypes that also loads dependencies.

I'm tentatively in favor of LoadAllTypes. A long time ago, back in pgx v2 all types were automatically loaded. That led to problems for people who had hundreds of thousands of types (https://github.com/jackc/pgx/issues/140). But if it is opt-in that should resolve the issue.

LoadTypes and LoadAllTypes would need to be smart enough to return types in dependency order so they could be registered successfully.


Also, some sort of caching like you suggested in https://github.com/jackc/pgtype/issues/216 could be really beneficial when there is very large numbers of types, but I am nervous about as there are some nasty edge cases, especially using a HA setup with logical replication -- the OIDs may be different from one server to the next. I'd rather make it so mast that caching is unnecessary.

nicois commented 4 months ago

Ok, thanks for the confirmation that I'm headed in the right direction.

I'll put together some PRs in the coming days to address these.