jschaf / pggen

Generate type-safe Go for any Postgres query. If Postgres can run the query, pggen can generate code for it.
MIT License
281 stars 26 forks source link

Support for In conditions #81

Closed Oliver-Fish closed 1 year ago

Oliver-Fish commented 1 year ago

Given the following query

SELECT
    user_profile.id,
    user_profile.email,
FROM
    user_profile
WHERE
    user_profile.email in (pggen.arg ('emails'));

I get the following function signature out.

GetUsersByEmail(ctx context.Context, emails string)

However, it would be great if I got

GetUsersByEmail(ctx context.Context, emails []string)

Is this currently possible? if not, are there any blockers to supporting this?

jschaf commented 1 year ago

Postgres doesn't allow prepared statements for in expressions. Since pggen compiles to prepared statements, there's no first-class support for in expressions.

A supported alternative I've used is, col = ANY (pggen.arg('val')::some_type[]), described by https://stackoverflow.com/questions/36929116/how-to-use-in-clause-with-preparedstatement-in-postgresql. To apply it to your example:

SELECT
    user_profile.id,
    user_profile.email,
FROM
    user_profile
WHERE
    user_profile.email = any (pggen.arg ('emails')::text[]);

I'm not entirely opposed to adding support in the future, but since there's a reasonable workaround, that feature is a low priority. The reason to support in is that it sometimes triggers different query plans: https://stackoverflow.com/questions/34627026/in-vs-any-operator-in-postgresql

Oliver-Fish commented 1 year ago

So I devised an alternative workaround, but I suspect yours is better as unnest is happening at runtime; however, in most cases where the array is small, I suspect the performance cost is tiny. It might solve the query planning issue you mentioned above, though.

SELECT
    id,
    email
FROM
    user_profile
WHERE
    email in (select unnest(pggen.arg('email')::TEXT[]));

I reviewed the code around type deferring, and it looks like for first-class In support, you would need some query parsing to get context-aware knowledge of what type is better than the OID returned from the prepared statement parameter_types. This feels quite heavy for a small use-case like this that can be worked around; I suspect it would be useful first to collect a list of edge cases that would gain from static analysis of the queries and see if this is a worthwhile endeavor as I am not sure IN alone is enough for such an increase in complexity.

Additionally, the query running on the database is very true to the query you write in your SQL files. However, this would break down that behavior slightly as the generated Go code would need to do some query building to build the correct number of $X values into the query ran against the database depending on the input size of the slice.

Not sure if you want to close this or leave it open; however, I am happy with the workarounds. Thanks for the quick response, by the way.

jschaf commented 1 year ago

you would need some query parsing to get context-aware knowledge of what type is better than the OID returned from the prepared statement parameter_types. This feels quite heavy for a small use-case like this that can be worked around

Yes, I've stuck solely with prepared values because it's much easier to support.

I suspect it would be useful first to collect a list of edge cases that would gain from static analysis of the queries and see if this is a worthwhile endeavor as I am not sure IN alone is enough for such an increase in complexity.

Good thought; I agree. I'll close. If folks have a reasonable use case, please comment with details.