jschaf / pggen

Generate type-safe Go for any Postgres query. If Postgres can run the query, pggen can generate code for it.
MIT License
282 stars 26 forks source link

proposal: generate "queries" by stored pg function #25

Closed aight8 closed 2 years ago

aight8 commented 3 years ago

Motivation

Generally allow any complex nested queries. Currently it's unclear what is possible and when pggen cannot handle it (as a result this issue is in contract to the promising project description)

In contrast the limitation caused on pg side ("optional arguments" because no reliable argument & result type reflection is possible) are clear.

example schema

create type list_item as (
   name text,
   color text
);

create type list_stats as (
   val1 hstore,
   val2 int[]
);

create or replace function fn1(
    param1 bigint,
    page int = 1,
    item_count int = 10,
    out res_items list_item[],
    out res_stats list_stats
) language plpgsql as $$begin
    -- single or many query which fills res_items and res_stats 
    -- ex: with x as (...subquery...) select array_agg(...), array_agg(...) into res_items, res_stats
end$$;

keep in mind the same return values could be also created by a single statement used directly as pggen query. this example just demonstrate the way trough stored functions since this was my initial situation I was confronted with this idea.

example query

-- name: Fn1 :one
select fn1(
    pggen.arg('x')
);

My intention is to create plpgsql stored functions which takes all needed arguments once and has at least two different datasets as return values. There could be an arbitrary number of return values. They can be arrays, single values and of course nested.

advantages

disadvantages

challenges

questions

workaround

unclear

jschaf commented 3 years ago

Thank you for the outstanding report!

I think what you're asking is:

Given a function with complex OUT types, pggen should generate a strongly typed struct for the output.

The problem is that fn1 has an implicit return type of record with an OID of 2249. I think what's happening is that Postgres uses the dynamic record type for any functions that have OUT parameters. A record is a dynamic type not known at runtime so there's nothing pggen can do here. If Postgres doesn't know the type then pggen won't know the type. See https://stackoverflow.com/questions/30072787.

Here's the full information from pg_type for the record OID 2249. This info is hard-coded in the Postgres catalog tables.

OID for record type

[
  {
    "oid": 2249,
    "typname": "record",
    "typnamespace": 11,
    "typowner": 10,
    "typlen": -1,
    "typbyval": false,
    "typtype": "p",
    "typcategory": "P",
    "typispreferred": false,
    "typisdefined": true,
    "typdelim": ",",
    "typrelid": 0,
    "typelem": 0,
    "typarray": 2287,
    "typinput": "record_in",
    "typoutput": "record_out",
    "typreceive": "record_recv",
    "typsend": "record_send",
    "typmodin": "-",
    "typmodout": "-",
    "typanalyze": "-",
    "typalign": "d",
    "typstorage": "x",
    "typnotnull": false,
    "typbasetype": 0,
    "typtypmod": -1,
    "typndims": 0,
    "typcollation": 0,
    "typdefaultbin": null,
    "typdefault": null,
    "typacl": null
  }
]

pg_proc details for func - Note the return type of 2249, the record type.

[
  {
    "oid": 16519,
    "proname": "composite_func_2",
    "pronamespace": 16385,
    "proowner": 10,
    "prolang": 13385,
    "procost": 100,
    "prorows": 0,
    "provariadic": 0,
    "prosupport": "-",
    "prokind": "f",
    "prosecdef": false,
    "proleakproof": false,
    "proisstrict": false,
    "proretset": false,
    "provolatile": "i",
    "proparallel": "s",
    "pronargs": 3,
    "pronargdefaults": 2,
    "prorettype": 2249,
    "proargtypes": "20 23 23",
    "proallargtypes": [20, 23, 23, 16514, 16518],
    "proargmodes": ["i", "i", "i", "o", "o"],
    "proargnames": ["param1", "page", "item_count", "res_items", "res_stats"],
    "proargdefaults": "({CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 238 :constvalue 4 [ 1 0 0 0 0 0 0 0 ]} {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 263 :constvalue 4 [ 10 0 0 0 0 0 0 0 ]})",
    "protrftypes": null,
    "prosrc": "\nBEGIN\n  res_items := ARRAY [('foo', 'bar')::list_item];\n  res_stats := ('a=>1,a=>2'::hstore, ARRAY [param1, page, item_count])::list_stats;\nEND\n",
    "probin": null,
    "proconfig": null,
    "proacl": null
  }
]

Attempt 1: Use returns table

To get type information, maybe we can add a RETURNS TABLE(items list_item[], stats list_stats). That doesn't work because you can't mix table functions with IN/OUT parameters: https://stackoverflow.com/questions/47790940.

I tried removing the out params and modifying the query to use RETURNS TABLE so we can try to get a better return type than record.

CREATE FUNCTION composite_func_3(
  param1        bigint,
  page          int = 1,
  item_count    int = 10
)
RETURNS TABLE(items list_item[], stats list_stats)
  LANGUAGE plpgsql AS $$
BEGIN
  SELECT ARRAY [('foo', 'bar')::list_item], ('a=>1,a=>2'::hstore, ARRAY [param1, page, item_count])::list_stats;
END
$$ IMMUTABLE PARALLEL SAFE;

-- Query file

-- name: CompositeFunc3 :one
SELECT composite_func_3(pggen.arg('arg1'));

This didn't work, the OID return type was 2249, the record type, so same problem as above.


Attempt 2: Use returns setof new_type

I got the proper type by declaring a type that matched the return value and using setof NEW_TYPE, like so:

CREATE TYPE CF3_type AS (
  items list_item[],
  stats list_stats
);

CREATE FUNCTION composite_func_3(
  param1        bigint,
  page          int = 1,
  item_count    int = 10
)
  RETURNS setof cf3_type
  LANGUAGE plpgsql AS $$
BEGIN
  SELECT ARRAY [('foo', 'bar')::list_item], ('a=>1,a=>2'::hstore, ARRAY [param1, page, item_count])::list_stats;
END
$$ IMMUTABLE PARALLEL SAFE;

However, that errors with:

infer typed named query CompositeFunc3: infer output types for query: fetch oid types: find composite types: find type for composite column items oid=16547

Action items

  1. Fix the attempt 2 query.

  2. I don't think pggen can do anything for queries with OUT params that have a return type of record. To get better type information, we'd have to parse the query, figure out the method that was called and then use the postgres catalog tables to figure out the types that correspond to OUT parameters. Overall, it seems a bit silly that Postgres doesn't know the types.

aight8 commented 3 years ago

An idea were if it's not possible to reflect the return types in certain cases the following:

define the function names as pggen argument. after the db is setup with the given schema, the defined functions are reflected and a go query is created for it. polymorphic functions (when a given function name is defined multiple times with different signature) must be defined explicitly with it's signature. SQL query statements which only calls a function were anyway just stub queries.

pggen gen go \
   --schema-glob "db/schema/**/*.sql" \
   --pg-func "fn1" \
   --pg-func "fn1_name_in_go=fn1" \ # or Fn1NameInGo ... explicit naming should be always possible
   --pg-func "poly_fn2_x=polyfn2(int, text)" \ # or PolyFn2X ... but is required for ambiguous function names
   --pg-func "poly_fn2_y=polyfn2(int, int)" # or PolyFn2Y

based on function info from pgcli and then \df+ command to list function details. it uses sql for the listed function information so I think this is a good starting point:

Schema could be defined by passing function with the schema "public.fn1", default is "public" Name is the function name itself without any args. It creates the fundament of the naming of the call function in go. Result data type contains the signature like:

Argument data types contains the signature like:

Type should be "normal" // other were among others "trigger", "agg" Language should be "sql" or "plpgsql" // other were "c" (contains arg types like 'internal' etc. - so early error with explanation) Source code contains the source code of the function (could be used as comment, or not) Description could be used in the go function comment

What is interesting here is:

The generated go source code, calls the postgres function, and passes optional args only if they are defined. Otherwise the arg should be omitted so that pg can use the default value (it only use it if the arg not present in the call at all). Just for completeness: arguments with a default functions must be always after all other mandatory args in pg.

Also nice: If the function changes in the future my go code fails on build time, which is similar to static type checking - or just calling a function which not exists. When the full function signature is defined in "pg-func", of course not - then it fails even earlier by pggen while source code generation.


I try to investigate more about the "record" type issue, or when this happens. My described example however shows, that a creation of go code could be possible by directly intepreting defined functions instead of based on prepared statements. It's also not using extended parsing for this scenario. (side note: the most advanced pg parser is https://github.com/pganalyze/pg_query_go - it received now an update to pg13, it requires pg headers)

Note: What also could be a solution to create a result type composit type for a complex result.

jschaf commented 3 years ago

Question 1: For this query, what's the signature for generated Go function?

-- name: Fn1 :one
select fn1(
    pggen.arg('x')
);

I agree that this proposal is possible, at least for simple cases, but I have 2 concerns:

  1. It's a step towards parsing the query which I've avoided so far. Postgres maintains information about the function out parameters but pggen would still have to figure out where in the query the function was called.

  2. I think you can rewrite the function to have better type information so that pggen works out of the box. Something like below:

CREATE TYPE my_custom_type ( res_items list_item[], res_stats lists_stats );

CREATE FUNCTION fn1(
    param1 bigint,
    page int = 1,
    item_count int = 10
) RETURNS my_custom_type AS '';

I haven't tried, so if it's not possible, let me know.

aight8 commented 3 years ago

My idea is to skip the whole query definition part for a defined function. The defined function itself is the query, and a very accurate one. (regarding optional arguments, multiple out arguments)

It not solves the reflection problem. Your last example is exactly how I use it today.

The downsides are:

But that's totally fine, because pggen try it's best to figure out the types. Querying functions or a queries which returns record (this includes functions with multi OUT) is not possible, and if the an error message explain this to the user I think this is totally arguable.


I try to create a full example of my proposal:

schema

CREATE FUNCTION fn1(
    param1 bigint,
    page int = 1,
    item_count int = 10,
    OUT res_items list_items[],
    OUT res_stats lists_stats
) AS '';

query

// nothing

generation

pggen --pg-func=fn1

go source

...
type Fn1OptParams struct {
   Page *int
   ItemCount *int
}

func (Querier) Fn1(param1 int, Fn1OptParams) (resItems ListItem[], resStats ListsStats, err error) {
   // ...
}
...

how the function parameters in return value is built up is not defined. Ideas were:

effectively called sql

select fn1(param) -- when you call in go -> querier.Fn1(999)
select fn1(param, page => 1) -- when you call in go -> querier.Fn1(999, Fn1OptParams{Page: &1})
select fn1(param, page => 1, item_count => 10) -- when you call in go -> querier.Fn1(999, Fn1OptParams{Page: &1, ItemCount: &10})
select fn1(param, item_count => 10) -- when you call in go -> querier.Fn1(999, Fn1OptParams{ItemCount: &10})

Because internally the query is built up by a defined function, pggen can find out the exact result signature by check for the OUT arguments, and then just join them together (because in the result section it only appears as record or setof record). If no OUT arguments are present check for the result type. This logic don't use prepared statement to analyze anything. The rest of the generation logic is similar as for all other pggen queries.

aight8 commented 3 years ago
create or replace function format_types(oid[])
    returns text[] as $$
select array(select format_type(unnest($1), null))
$$ language sql immutable;

select
     l.lanname lang,
     n.nspname "schema",
     p.oid,
     p.prokind kind,         -- ! char // f = normal fn, p = procedure, a = aggregate fn, or w = window fn
     p.proname "name",       -- ! name // name of the function (ex: get_user_info)
   format_types(p.proargtypes[:array_length(p.proargtypes,1)-p.pronargdefaults-1]) in_required_args,  -- ! oidvector (references pg_type.oid) // An array of the data types of the function arguments. This includes only input arguments (including INOUT and VARIADIC arguments), and thus represents the call signature of the function.
     format_types(p.proargtypes[:p.pronargdefaults-1]) as in_optional_args,
     p.pronargdefaults default_arg_count, -- ! int2 // Number of arguments that have defaults
     p.proisstrict no_null_args,  -- ! bool // Function returns null if any call argument is null. In that case the function won't actually be called at all. Functions that are not “strict” must be prepared to handle null inputs.
     format_type(p.provariadic, null) variadic_arg_type, -- ! oid (references pg_type.oid) // Data type of the variadic array parameter's elements, or zero if the function does not have a variadic parameter
     format_type(p.prorettype, null) "return_type", -- ! oid (references pg_type.oid) // Data type of the return value
     p.proretset returns_many, -- ! bool // Function returns a set (i.e., multiple values of the specified data type)
     format_types(p.proallargtypes[array_length(p.proargtypes,1)+1:]) out_arg_types,
   p.proargmodes arg_types,    -- ! array of: 'i' for IN, 'o' for OUT, 'b' for INOUT, 'v' for VARIADIC, 't' for TABLE -- if ALL arguments are 'i' this field will be NULL
     p.proargnames arg_names     -- ! text[] // An array of the names of the function arguments. Arguments without a name are set to empty strings in the array. If none of the arguments have a name, this field will be null. Note that subscripts correspond to positions of proallargtypes not proargtypes.
from pg_catalog.pg_proc p
   left join pg_catalog.pg_namespace n on n.oid = p.pronamespace
     left join pg_catalog.pg_language l ON l.oid = p.prolang
where p.proname like 'my_%';
-- where p.oid='myfn()'::regprocedure::oid;

Here's another sql snippet to create many kind of functions:

drop type if exists user_data cascade;
create type user_data as (
    firstname text,
    age int
);
drop type if exists user_stat cascade;
create type user_stat as (
    stat1 int,
    stat2 int
);

-- returns: MANY(user_data[], user_stat)
create or replace function my_fn_1(
    page int = 1,
    item_count int = 10,
    out user_info_list user_data[],
    out user_stat user_stat
) returns setof record language plpgsql as $$begin

end$$;

-- 2x OUT, polymorph function #1
-- returns: ONE(user_data[], user_stat)
create or replace function my_fn_2(
    page int = 1,
    item_count int = 10,
    out user_info_list user_data[],
    out user_stat user_stat
) returns record language plpgsql as $$begin

end$$;

-- 2x OUT, STRICT, polymorph function #2
-- returns: ONE(user_data[], user_stat)
create or replace function my_fn_2(
    page text,
    out user_info_list user_data[],
    out user_stat user_stat
) returns setof record strict language plpgsql as $$begin

end$$;

-- VARIADIC arg
create or replace function my_fn_4(
    user_id int,
    digits variadic int[]
) returns void language plpgsql as $$begin

end$$;
aight8 commented 3 years ago

I came up with another idea I want to mention (should be self-described) The query entry has no body content.

advantages:

disadvantages:

query

-- name: WhateverFn1 :pg-func fn_name_in_pg()
-- name: WhateverFn2 :pg-func fn_name_in_pg(text)
-- name: WhateverFn3 :pg-func another_pg_fn()

the function is created in the schema file. the function signature in the query file must in the format how postgres can uniquely identify it. (I think pg_get_function_identity_arguments)

jschaf commented 3 years ago

My preferred approach so far is to use a normal query file

-- name: WhateverFn
SELECT fn_name_in_pg();

If the function return type is a record type and the function has output parameters we can infer what the results should be. I think we can get this from the explain output without too much trouble. For parsing, I think we limit it to top-level columns that match a regex like [a-z0-9]+\(.

It's not clear how it should interact with other parameters:

SELECT fn_name_in_pg(), 2 as two

I think I'd want all the parameters at the top level instead of nesting the function output parameters.

aight8 commented 3 years ago

I'm afrait a little when the parsing is done manually, it can end in a patchwork quilt of supported statements what pggen can intrepet correctly - and what ends in hard to understand parsing errors. This in turn leads to great code complexity and many following issues. The scope is unclear.

It would be hoped that the regex does not generate false positives. In this case it would be important to define the scope - e.g. a sufficiently large list of statements that define the scope as a whole. Then possible statements that just do not work, but still recognizes - the error action is already little meaningful, the parsing aggravates the situation again.

Open questions:

But wouldn't the inline function anaysing build on my proposal anyway? So this proposal could be the first step (query Function informations), and the inlining mechanism in a second step (parsing statements for function calls and apply it).

However this proposal still have a following big advantages:

  1. If I have e.g. 50 pg functions that I want to use in go, I have to create and maintain 50 pggen statments which only task is to call the function, including the function definition itself of course
  2. I can use optional input arguments with default values defined in the create function statement
jschaf commented 2 years ago

Stumbled upon a much simpler solution. The out columns of fn will appear like normal postgres columns.

SELECT * FROM fn1;