possible to support a two-argument pggen.arg eg `pggen.arg(param text, default_value anyelement)`?

breathe commented 1 year ago

I came across pggen today and I'm enjoying it so far - I like static queries and the vision here.

One pain point though is that I would like to be able to directly execute my sql queries for ease of testing and prototyping -- and I need some way of supplying appropriate values and value_types to ppgen.arg().

I initially did something like this

CREATE SCHEMA pggen;
CREATE OR REPLACE FUNCTION pggen.arg(param text)
  RETURNS text
  AS $$
  SELECT
    CASE WHEN param = 'rental_id' THEN
      '1'
    WHEN param = 'Limit' THEN
      '3'
    WHEN param = 'Offset' THEN
      '0'
...
    WHEN param = 'Sort' THEN
      'price'
    WHEN param = 'SortDirection' THEN
      'ASC'
    END
$$
LANGUAGE sql;

The idea there was to define a fake pggen.arg() method which returns different string values based on the provided param. This gives a workflow for interactively prototyping a query -- and when want to test different query arguments I just change the value in the fake definition of pggen.arg() and re-execute the query.

There are two problems with that:

its pretty annoying to re-define the stub method whenever I want to change the value of an interactively defined parameter
more importantly -- its only possible to use this to define string valued arguments. There are obviously arguments that I want to define which have other postgres types

I'm wondering if we could add support to pggen for a two argument version of pggen.arg -- like below:

CREATE OR REPLACE FUNCTION pggen.arg(param text, default_value anyelement)
  RETURNS anyelement
  AS $$
  SELECT
    default_value
$$
LANGUAGE sql;

This would allow for writing queries that are consumable by pggen but which cann still be executed interactively for ease of development and feedback speed.

-- name: SearchFoo :many
select * from foo where id = any(pggen.arg('matching_ids', '{1,4,9}'::integer[])`

For the above, when the query is run interactively pggen.arg(text, integer[]) will be invoked and the integer[] value will be returned, meanwhile the compiled query will take the value of the argument from the corresponding prepared statement placeholder as normal.

jschaf commented 1 year ago

One pain point though is that I would like to be able to directly execute my sql queries for ease of testing and prototyping -- and I need some way of supplying appropriate values and value_types to ppgen.arg().

The alternative syntax for params suggested in https://github.com/jschaf/pggen/issues/42 is slowly growing on me.

As for what I personally use: some editors allow you to define a regexp for a customizable parameter syntax. Here's how to do it for IntelliJ-based editors:

Run pggen queries directly

Instead of copying queries, you can run pggen queries directly against any IntelliJ datasource by defining a regexp to match pggen.arg('val'). IntelliJ calls these User Parameters. Typically, these look like ? or :name but IntelliJ is flexible enough to match the pggen syntax.

First, create a new parameter pattern with the regexp.

pggen.arg\('([^']+)'\)

Does that approach work for you?

breathe commented 1 year ago

Does that approach work for you?

I appreciate the workflow info! The jetbrains rewrite rule seems like an 'ok' workaround but that approach preserves a number of pretty annoying papercuts ...

Suitable values for potential arguments live outside the codebase (inside some ide settings). This means you can't just quickly and easily document/preserve/share/re-enter any arbitrary executable context for any arbitrary sql query in your codebase ... I want to be able to open up a random sql file, leave a little turd inside the file which describes some appropriate values for all the arguments, and immediately re-execute the sql to inspect the results when I come back to it later
errors with column characters become useless because the sql executed doesn't match what was written (I believe this applies to both the ide rewrite workflow and the proposed :arg syntax)
for :arg syntax, writing not valid sql will produce experience degradation in some editors or in terms of consumption of the sql from some tools (formatters/beautifiers?). Maybe that syntax extension is common enough that tools tend to support it, but in my opinion, anything that leads to the sql being not just valid sql is strictly worse than alternative approaches where the author always writes valid sql ...

As I consider your suggestion and think about the two arg approach again, I'm even more leaning towards the two arg approach ...

Consider a workflow like this:

CREATE TYPE rental_user AS (
  id integer,
  first_name text,
  last_name text
);

CREATE OR REPLACE FUNCTION pggen.arg(param text, default_type_or_value ANYELEMENT)
  RETURNS anyelement
  LANGUAGE plpgsql
  AS $func$
BEGIN
  IF param = 'name' THEN
    RETURN 'bob';
  ELSIF param = 'age' THEN
    RETURN 12;
  ELSIF param = 'user' THEN
    RETURN ROW(1,
      'bob',
      'lasty')::rental_user;
  ELSIF param = 'ids' THEN
    RETURN '{1,2}'::integer[];
  ELSIF param = 'sort' THEN
    RETURN 'schemaname';
  ELSIF param = 'sort_direction' THEN
    RETURN 'ASC';
  ELSIF param = 'limit' THEN
    RETURN 6;
  ELSE
    RETURN default_type_or_value;
  END IF;
END
$func$;

SELECT
  pggen.arg('name', NULL::text) AS name,
  pggen.arg('age', NULL::integer) AS age,
  pggen.arg('user', NULL::rental_user) AS user,
  pggen.arg('ids', NULL::integer[]) AS ids,
  pggen.arg('sort_direction', NULL::text) AS sort_direction
FROM
  pg_tables
ORDER BY
  CASE WHEN pggen.arg('sort', NULL::text) = 'schemaname' THEN
    schemaname
  ELSE
    tablename
  END
LIMIT pggen.arg('limit', NULL::bigint);

valid sql so the universe of sql analysis tools should tend toward behaving and things like column numbers in error messages should match up
easy to define a parameter value set next to the query definitions themselves that can be preserved as part of the codebase
shouldn't modify existing behavior or philosophy of tool
flexible -- for example developer could define helper functions for convenience of their own interactive use ... CREATE OR REPLACE FUNCTION pggen.default_user() returns rental_user ...

This approach could maybe also provide a solution for -- https://github.com/jschaf/pggen/issues/85 ... I think it could make sense for the type inference for the two argument ppgen.arg() to always map the argument to a go nullable type -- and if nil is provided to generate a query that resolves to the default value supplied in second argument ...

eg -- select pggen.arg('foo', 1::integer) maps foo to the go type *int and compiles to the the query select coalesce(:1, 1) ... This would give a solution to nullable types and also provide a solution for defining default values for query arguments (which is also something I want to be able to do in order to construct the most desirable go api for each query I write).

If this approach was embraced, maybe there could even future features added building on this -- maybe a special comment directive for the declaration of pggen.arg -- not 100% sure what it would do but can think of various ideas -- maybe it would generate a go func that can be used inside a parameterless (on go side) test, or a go type that produces an explain query ... (just spitballing but I know I'd like a feature to generate an 'explain statement' so that I can write tests which assert expected/valid query plans ...)

jschaf commented 1 year ago

Funnily enough, I initially prototyped support for pggen.arg with a default value but never exposed it: https://github.com/jschaf/pggen/commit/363cf19563c6f8fcc9cdc59b20b7d27fc81e2f85

The reasons I haven't implemented pggen.arg with a default:

Nil-everything: For the two-arg approach of pggen.arg('limit', NULL::bigint), we'd have to make every input arg a non-pointer. Otherwise, for data types like int, there's no way to distinguish between the unset and a zero value. I took inspiration from protobuf where primitive values don't have a nil value, just a zero value.
Parsing: To do it properly would require a better parser. Simple things like pggen.arg('foo', 2) are easy enough. But it's not clear where to draw the line. Should pggen.arg support any scalar value? pggen.arg('foo', exists(SELECT 1 FROM ...)). Currently, pggen's "parser" only recognizes strings, semicolons, and comments--which is the minimum syntax needed to identify a complete query.
It turned out to be easy enough to work around in SQL.

SELECT
  nullif(pggen.arg('foo'), ''),                          -- to use a null default value
  coalesce(nullif(pggen.arg('foo'), ''), 'default_value) -- to use a default value

Suitable values for potential arguments live outside the codebase

I think using the nullif trick embeds the default value in the query.

At the design level, I'm wary of expanding Pggen's syntax to support interactive use-case. I'm sympathetic since I've done regexp surgery to convert pggen queries to support interactive queries.

Aside

I don't think the following helps the interactive use-case but it gives an idea for pggen's future direction:

Long term, I think the way forward is more user-configurable type-mappers at three levels. At present, type overrides are only supported at the invocation level.

invocation level
query level
column level

Then a user can use different type mappers to support null or default values. I should start a pggen "v2" tracking ticket for these ideas.

breathe commented 1 year ago

Nil-everything: For the two-arg approach of pggen.arg('limit', NULL::bigint), we'd have to make every input arg a non-pointer. Otherwise, for data types like int, there's no way to distinguish between the unset and a zero value. I took inspiration from protobuf where primitive values don't have a nil value, just a zero value.

Don't you ever run into cases where you have an argument with a zero-value which is inappropriate for the query ...? The text example you gave seems like one of the few cases where the zero-value is likely to be usable with the intended semantics ... It doesn't seem like there's a way to represent a query with an optional integer parameter at all since 0 would be a very frequently meaningful query argument ... For parameter types like integer[] -- the inability to distinguish the empty array from null also seems generally annoying/error prone to me ... I think I'd rather make all arguments have pointer type and define my queries to either blowup if provided nil or define the query to supply an appropriate default semantic for any optional parameters when given nil ...

Parsing: To do it properly would require a better parser. Simple things like pggen.arg('foo', 2) are easy enough. But it's not clear where to draw the line. Should pggen.arg support any scalar value? pggen.arg('foo', exists(SELECT 1 FROM ...)). Currently, pggen's "parser" only recognizes strings, semicolons, and comments--which is the minimum syntax needed to identify a complete query.

Maybe I'm wrong here ... but could a simple approach work here? Keep count of the non-string-non-comment-enclosed open/close parenthesis after the ',' until finding a closing paren which represents the end of the ppgen.arg() expression ...? (and also disallow use of ppgen.arg( within the default value expression)

breathe commented 1 year ago

Maybe I'm wrong here ... but wouldn't it be correct to just keep count of the non-string-enclosed open/close parenthesis after the ',' until we find the closing paren that represents the end of the ppgen.arg() expression ...? (and also disallow use of ppgen.arg( within the default value expression)

I made a sketch of what I was thinking -- https://github.com/breathe/pggen/commit/fece1febc4ade79bc8333b8e2eb5432d1a1190c6

That implementation puts logic into the tokenizer to scan ppgen.arg(...) into a new Directive token type and then modifies the parser to extract the name and default value expression from the Directive token ...

Maybe an approach along these lines could work without a huge amount of additional complexity ...?

(Note the existing scanner and parser tests pass but I didn't exercise the two-argument version at all ... am just wondering what you think of this approach / whether you would consider accepting a change like this ...)

jschaf commented 1 year ago

Thanks for experimenting with the code! Adding a Directive is a good idea as I'd like to eventually support dynamic predicates: pggen.predicate.

It seems like the main problem is there's no way to pass null as an input with pggen.arg.

If pggen could pass null:

We get default values with plain SQL like coalesce(pggen.arg('foo'), 'some_text').
Your example of defining pggen.arg with a case statement would work since it could emit null, and the default value is handled by the coalesce, which lives in the query.

Let me know if that matches your understanding. If it does, I'd prefer to figure out how to pass null values via pggen.arg since it's a more general mechanism and avoids expanding the API surface. A few other feature requests also boil down to supporting more flexible marshaling for pggen.arg and unmarshaling for the returned column types, including:

breathe commented 1 year ago

2. Your example of defining pggen.arg with a case statement would work since it could emit null, and the default value is handled by the coalesce, which lives in the query.

But the case statement example only supports string typed arguments. afaict the two arg version is needed for the interactive use case as it's the only way to be able to define a pggen.arg() stub that can return values for arguments with arbitrary types associated with the arg name (?)

I still see value in the two arg version of the directive ... even with the addition of more type customization for argument types -- it would still be nice to have a version of the directive that always infers an optional type ... the two arg version could behave that way ...

pggen.arg('foo') -- keeps current implementation

pggen.arg('foo', null::int) -- infers optional type for 'foo', compiles to 'coalesce(:1, null::int)'

This could be added without breaking backward compatibility I think ... And I don't think this enhancement would add any major surprises to the implementation of any of the proposed new type inference customization features ...?

(even with the ability to customize the inferred go argument types, I'd prefer not writing those in general -- especially if the only reason I have to write them is to get an optional type ...)

breathe commented 1 year ago

I went ahead and wrote a PR to implement ... Let me know what you think if you have a chance?

Thanks!

breathe commented 1 year ago

Closing as I gather there is not interest in supporting the interactive sql use case

jschaf / pggen

possible to support a two-argument pggen.arg eg `pggen.arg(param text, default_value anyelement)`? #86

Run pggen queries directly

Aside