Closed aight8 closed 2 years ago
Thank you for the outstanding report!
I think what you're asking is:
Given a function with complex
OUT
types, pggen should generate a strongly typed struct for the output.
The problem is that fn1
has an implicit return type of record
with an OID of 2249. I think what's happening is that Postgres uses the dynamic record
type for any functions that have OUT parameters. A record is a dynamic type not known at runtime so there's nothing pggen can do here. If Postgres doesn't know the type then pggen won't know the type. See https://stackoverflow.com/questions/30072787.
Here's the full information from pg_type
for the record
OID 2249. This info is hard-coded in the Postgres catalog tables.
OID for record type
[
{
"oid": 2249,
"typname": "record",
"typnamespace": 11,
"typowner": 10,
"typlen": -1,
"typbyval": false,
"typtype": "p",
"typcategory": "P",
"typispreferred": false,
"typisdefined": true,
"typdelim": ",",
"typrelid": 0,
"typelem": 0,
"typarray": 2287,
"typinput": "record_in",
"typoutput": "record_out",
"typreceive": "record_recv",
"typsend": "record_send",
"typmodin": "-",
"typmodout": "-",
"typanalyze": "-",
"typalign": "d",
"typstorage": "x",
"typnotnull": false,
"typbasetype": 0,
"typtypmod": -1,
"typndims": 0,
"typcollation": 0,
"typdefaultbin": null,
"typdefault": null,
"typacl": null
}
]
pg_proc details for func - Note the return type of 2249, the record type.
[
{
"oid": 16519,
"proname": "composite_func_2",
"pronamespace": 16385,
"proowner": 10,
"prolang": 13385,
"procost": 100,
"prorows": 0,
"provariadic": 0,
"prosupport": "-",
"prokind": "f",
"prosecdef": false,
"proleakproof": false,
"proisstrict": false,
"proretset": false,
"provolatile": "i",
"proparallel": "s",
"pronargs": 3,
"pronargdefaults": 2,
"prorettype": 2249,
"proargtypes": "20 23 23",
"proallargtypes": [20, 23, 23, 16514, 16518],
"proargmodes": ["i", "i", "i", "o", "o"],
"proargnames": ["param1", "page", "item_count", "res_items", "res_stats"],
"proargdefaults": "({CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 238 :constvalue 4 [ 1 0 0 0 0 0 0 0 ]} {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 263 :constvalue 4 [ 10 0 0 0 0 0 0 0 ]})",
"protrftypes": null,
"prosrc": "\nBEGIN\n res_items := ARRAY [('foo', 'bar')::list_item];\n res_stats := ('a=>1,a=>2'::hstore, ARRAY [param1, page, item_count])::list_stats;\nEND\n",
"probin": null,
"proconfig": null,
"proacl": null
}
]
returns table
To get type information, maybe we can add a RETURNS TABLE(items list_item[], stats list_stats)
. That doesn't work because you can't mix table functions with IN/OUT parameters: https://stackoverflow.com/questions/47790940.
I tried removing the out params and modifying the query to use RETURNS TABLE
so we can try to get a better return type than record
.
CREATE FUNCTION composite_func_3(
param1 bigint,
page int = 1,
item_count int = 10
)
RETURNS TABLE(items list_item[], stats list_stats)
LANGUAGE plpgsql AS $$
BEGIN
SELECT ARRAY [('foo', 'bar')::list_item], ('a=>1,a=>2'::hstore, ARRAY [param1, page, item_count])::list_stats;
END
$$ IMMUTABLE PARALLEL SAFE;
-- Query file
-- name: CompositeFunc3 :one
SELECT composite_func_3(pggen.arg('arg1'));
This didn't work, the OID return type was 2249, the record
type, so same problem as above.
returns setof new_type
I got the proper type by declaring a type that matched the return value and using setof NEW_TYPE
, like so:
CREATE TYPE CF3_type AS (
items list_item[],
stats list_stats
);
CREATE FUNCTION composite_func_3(
param1 bigint,
page int = 1,
item_count int = 10
)
RETURNS setof cf3_type
LANGUAGE plpgsql AS $$
BEGIN
SELECT ARRAY [('foo', 'bar')::list_item], ('a=>1,a=>2'::hstore, ARRAY [param1, page, item_count])::list_stats;
END
$$ IMMUTABLE PARALLEL SAFE;
However, that errors with:
infer typed named query CompositeFunc3: infer output types for query: fetch oid types: find composite types: find type for composite column items oid=16547
Fix the attempt 2 query.
I don't think pggen can do anything for queries with OUT params that have a return type of record
. To get better type information, we'd have to parse the query, figure out the method that was called and then use the postgres catalog tables to figure out the types that correspond to OUT parameters. Overall, it seems a bit silly that Postgres doesn't know the types.
An idea were if it's not possible to reflect the return types in certain cases the following:
define the function names as pggen argument. after the db is setup with the given schema, the defined functions are reflected and a go query is created for it. polymorphic functions (when a given function name is defined multiple times with different signature) must be defined explicitly with it's signature. SQL query statements which only calls a function were anyway just stub queries.
pggen gen go \
--schema-glob "db/schema/**/*.sql" \
--pg-func "fn1" \
--pg-func "fn1_name_in_go=fn1" \ # or Fn1NameInGo ... explicit naming should be always possible
--pg-func "poly_fn2_x=polyfn2(int, text)" \ # or PolyFn2X ... but is required for ambiguous function names
--pg-func "poly_fn2_y=polyfn2(int, int)" # or PolyFn2Y
based on function info from pgcli
and then \df+
command to list function details. it uses sql for the listed function information so I think this is a good starting point:
Schema could be defined by passing function with the schema "public.fn1", default is "public" Name is the function name itself without any args. It creates the fundament of the naming of the call function in go. Result data type contains the signature like:
text
TABLE(type text, oid oid, schema text, name text, params text)
integer[]
void
SETOF text
record
or SETOF record
Argument data types
still contains the OUT args
record
is the result because multiple OUT args
results in a unnamed result type. there is literally an unknown type which is an union of multiple types - so this results in record
. although the composition of the result are very well known.returns
and out/inout
in pgArgument data types contains the signature like:
hstore, text[]
arg1 text
arg1 integer, arg2 text, arg3 integer, arg4 integer DEFAULT NULL::integer, OUT arg5 integer, OUT arg6 integer
Type should be "normal" // other were among others "trigger", "agg" Language should be "sql" or "plpgsql" // other were "c" (contains arg types like 'internal' etc. - so early error with explanation) Source code contains the source code of the function (could be used as comment, or not) Description could be used in the go function comment
What is interesting here is:
Argument data types
contains the full signature
pggen.arg('NAME')
- if not provided the user don't care about the name so the go function can receive nameless args no matter how much it were (arg1, arg2, arg3... or a, b, c, ...)The generated go source code, calls the postgres function, and passes optional args only if they are defined. Otherwise the arg should be omitted so that pg can use the default value (it only use it if the arg not present in the call at all). Just for completeness: arguments with a default functions must be always after all other mandatory args in pg.
Also nice: If the function changes in the future my go code fails on build time, which is similar to static type checking - or just calling a function which not exists. When the full function signature is defined in "pg-func", of course not - then it fails even earlier by pggen while source code generation.
I try to investigate more about the "record" type issue, or when this happens. My described example however shows, that a creation of go code could be possible by directly intepreting defined functions instead of based on prepared statements. It's also not using extended parsing for this scenario. (side note: the most advanced pg parser is https://github.com/pganalyze/pg_query_go - it received now an update to pg13, it requires pg headers)
Note: What also could be a solution to create a result type composit type for a complex result.
Question 1: For this query, what's the signature for generated Go function?
-- name: Fn1 :one
select fn1(
pggen.arg('x')
);
I agree that this proposal is possible, at least for simple cases, but I have 2 concerns:
It's a step towards parsing the query which I've avoided so far. Postgres maintains information about the function out parameters but pggen would still have to figure out where in the query the function was called.
I think you can rewrite the function to have better type information so that pggen works out of the box. Something like below:
CREATE TYPE my_custom_type ( res_items list_item[], res_stats lists_stats );
CREATE FUNCTION fn1(
param1 bigint,
page int = 1,
item_count int = 10
) RETURNS my_custom_type AS '';
I haven't tried, so if it's not possible, let me know.
My idea is to skip the whole query definition part for a defined function. The defined function itself is the query, and a very accurate one. (regarding optional arguments, multiple out arguments)
It not solves the reflection problem. Your last example is exactly how I use it today.
The downsides are:
OUT
arguments are not supported (they always ends in a record
return type by prepared statement analysis)But that's totally fine, because pggen try it's best to figure out the types. Querying functions or a queries which returns record
(this includes functions with multi OUT) is not possible, and if the an error message explain this to the user I think this is totally arguable.
I try to create a full example of my proposal:
schema
CREATE FUNCTION fn1(
param1 bigint,
page int = 1,
item_count int = 10,
OUT res_items list_items[],
OUT res_stats lists_stats
) AS '';
query
// nothing
generation
pggen --pg-func=fn1
go source
...
type Fn1OptParams struct {
Page *int
ItemCount *int
}
func (Querier) Fn1(param1 int, Fn1OptParams) (resItems ListItem[], resStats ListsStats, err error) {
// ...
}
...
how the function parameters in return value is built up is not defined. Ideas were:
effectively called sql
select fn1(param) -- when you call in go -> querier.Fn1(999)
select fn1(param, page => 1) -- when you call in go -> querier.Fn1(999, Fn1OptParams{Page: &1})
select fn1(param, page => 1, item_count => 10) -- when you call in go -> querier.Fn1(999, Fn1OptParams{Page: &1, ItemCount: &10})
select fn1(param, item_count => 10) -- when you call in go -> querier.Fn1(999, Fn1OptParams{ItemCount: &10})
Because internally the query is built up by a defined function, pggen can find out the exact result signature by check for the OUT
arguments, and then just join them together (because in the result section it only appears as record
or setof record
).
If no OUT
arguments are present check for the result type
.
This logic don't use prepared statement to analyze anything.
The rest of the generation logic is similar as for all other pggen queries.
create or replace function format_types(oid[])
returns text[] as $$
select array(select format_type(unnest($1), null))
$$ language sql immutable;
select
l.lanname lang,
n.nspname "schema",
p.oid,
p.prokind kind, -- ! char // f = normal fn, p = procedure, a = aggregate fn, or w = window fn
p.proname "name", -- ! name // name of the function (ex: get_user_info)
format_types(p.proargtypes[:array_length(p.proargtypes,1)-p.pronargdefaults-1]) in_required_args, -- ! oidvector (references pg_type.oid) // An array of the data types of the function arguments. This includes only input arguments (including INOUT and VARIADIC arguments), and thus represents the call signature of the function.
format_types(p.proargtypes[:p.pronargdefaults-1]) as in_optional_args,
p.pronargdefaults default_arg_count, -- ! int2 // Number of arguments that have defaults
p.proisstrict no_null_args, -- ! bool // Function returns null if any call argument is null. In that case the function won't actually be called at all. Functions that are not “strict” must be prepared to handle null inputs.
format_type(p.provariadic, null) variadic_arg_type, -- ! oid (references pg_type.oid) // Data type of the variadic array parameter's elements, or zero if the function does not have a variadic parameter
format_type(p.prorettype, null) "return_type", -- ! oid (references pg_type.oid) // Data type of the return value
p.proretset returns_many, -- ! bool // Function returns a set (i.e., multiple values of the specified data type)
format_types(p.proallargtypes[array_length(p.proargtypes,1)+1:]) out_arg_types,
p.proargmodes arg_types, -- ! array of: 'i' for IN, 'o' for OUT, 'b' for INOUT, 'v' for VARIADIC, 't' for TABLE -- if ALL arguments are 'i' this field will be NULL
p.proargnames arg_names -- ! text[] // An array of the names of the function arguments. Arguments without a name are set to empty strings in the array. If none of the arguments have a name, this field will be null. Note that subscripts correspond to positions of proallargtypes not proargtypes.
from pg_catalog.pg_proc p
left join pg_catalog.pg_namespace n on n.oid = p.pronamespace
left join pg_catalog.pg_language l ON l.oid = p.prolang
where p.proname like 'my_%';
-- where p.oid='myfn()'::regprocedure::oid;
Here's another sql snippet to create many kind of functions:
drop type if exists user_data cascade;
create type user_data as (
firstname text,
age int
);
drop type if exists user_stat cascade;
create type user_stat as (
stat1 int,
stat2 int
);
-- returns: MANY(user_data[], user_stat)
create or replace function my_fn_1(
page int = 1,
item_count int = 10,
out user_info_list user_data[],
out user_stat user_stat
) returns setof record language plpgsql as $$begin
end$$;
-- 2x OUT, polymorph function #1
-- returns: ONE(user_data[], user_stat)
create or replace function my_fn_2(
page int = 1,
item_count int = 10,
out user_info_list user_data[],
out user_stat user_stat
) returns record language plpgsql as $$begin
end$$;
-- 2x OUT, STRICT, polymorph function #2
-- returns: ONE(user_data[], user_stat)
create or replace function my_fn_2(
page text,
out user_info_list user_data[],
out user_stat user_stat
) returns setof record strict language plpgsql as $$begin
end$$;
-- VARIADIC arg
create or replace function my_fn_4(
user_id int,
digits variadic int[]
) returns void language plpgsql as $$begin
end$$;
I came up with another idea I want to mention (should be self-described) The query entry has no body content.
advantages:
disadvantages:
-- name: WhateverFn1 :pg-func fn_name_in_pg()
-- name: WhateverFn2 :pg-func fn_name_in_pg(text)
-- name: WhateverFn3 :pg-func another_pg_fn()
the function is created in the schema file. the function signature in the query file must in the format how postgres can uniquely identify it. (I think pg_get_function_identity_arguments)
My preferred approach so far is to use a normal query file
-- name: WhateverFn
SELECT fn_name_in_pg();
If the function return type is a record type and the function has output parameters we can infer what the results should be. I think we can get this from the explain output without too much trouble. For parsing, I think we limit it to top-level columns that match a regex like [a-z0-9]+\(
.
It's not clear how it should interact with other parameters:
SELECT fn_name_in_pg(), 2 as two
I think I'd want all the parameters at the top level instead of nesting the function output parameters.
I'm afrait a little when the parsing is done manually, it can end in a patchwork quilt of supported statements what pggen can intrepet correctly - and what ends in hard to understand parsing errors. This in turn leads to great code complexity and many following issues. The scope is unclear.
It would be hoped that the regex does not generate false positives. In this case it would be important to define the scope - e.g. a sufficiently large list of statements that define the scope as a whole. Then possible statements that just do not work, but still recognizes - the error action is already little meaningful, the parsing aggravates the situation again.
Open questions:
But wouldn't the inline function anaysing build on my proposal anyway? So this proposal could be the first step (query Function informations), and the inlining mechanism in a second step (parsing statements for function calls and apply it).
However this proposal still have a following big advantages:
create function
statementStumbled upon a much simpler solution. The out columns of fn
will appear like normal postgres columns.
SELECT * FROM fn1;
Motivation
Generally allow any complex nested queries. Currently it's unclear what is possible and when pggen cannot handle it (as a result this issue is in contract to the promising project description)
In contrast the limitation caused on pg side ("optional arguments" because no reliable argument & result type reflection is possible) are clear.
example schema
keep in mind the same return values could be also created by a single statement used directly as pggen query. this example just demonstrate the way trough stored functions since this was my initial situation I was confronted with this idea.
example query
My intention is to create plpgsql stored functions which takes all needed arguments once and has at least two different datasets as return values. There could be an arbitrary number of return values. They can be arrays, single values and of course nested.
advantages
disadvantages
challenges
questions
FindDescendantOIDs
. When some OID's must be preprocessed in pgx, maybe a preparation function (likePrepareAll
in #26 ) could be a solution (since the pg binary msg format don't support more info on query time, a preparation of pgx is anyway indispensable)workaround
unclear
call plpgsql function
vsrun batched prepared statements
in general