feat: go type mapping per query argument

0xjac commented 2 years ago

Extends the query annotation to specify custom go types using the same qualified go type format as for the general custom type mapping.

Closes #46

0xjac commented 2 years ago

@jschaf This is something I need for one of my use case (see #46 for details) so I'm working on an implementation. It's missing the mapping of return columns, tests, and docs, but you can have a look at the mapping of query arguments (inputs) already if you have the time.

If you think this is something you are definitely not interested in merging, I'd appreciate if you would let me know so I don't waste too much time on it :wink:

0xjac commented 2 years ago

Thank you for diving into the codebase. I'm in favor of the change so happy to keep working with you on this PR.

The main thing I'm focused on is how to present the options for the user to keep the interface as simple as possible. I've had a number of requests which essentially boil down to more configurable type mappings (#65). I think the answer is to support an intermediate unmarshal type that's responsible for converting the Postgres type into the destination Go type, so timestamptz to time.Time in the below example.
--go-type timestamptz=github.com/jschaf.TimeUnmarshaller=time.Time
I'd like the syntax to be pretty similar between the query files and command line so I'm interested in what the SQL file would look like (ignoring the intermediate idea since that's out of scope for this work).
-- name: FooBar go-type: timestamptz=time.Time :many

Interesting. I can definitely adapt the parsing to be more like the command line arguments. The current format matches what is specified in #46, which in turn is inspired by the current (name: ...) annotation. Taking into account your feedback, I would propose something like:

-- name: GetDailyFlightsFromAircraft :many
-- arg: day=github.com/0xjac/custom-project/types.Day
-- return: departure=time.Time
-- return: eta=*time.Time
SELECT f.flight_number, f.departure, f.arrival, f.eta
FROM flights f
WHERE pggen.arg("day") <= f.departure AND f.departure < pggen.arg("day") + 1
ORDER BY f.departure DESC;

Few things to notice:

The left side of the equal refers to an arg name (for arg:) or a column name (for return:) not a pg type.
The name:, arg:, return: annotations should be in this order. I.e. no arg: after a return:. However the order amongst multiple arg: and return: is free.
Each type mapping should be on a separate line (for readability)
The arg/return type mapping takes precedence, but is not mandatory. If omitted the global type mapping (--go-type via CLI) is considered and then the type inference (as is already the case). Here the f.arrival column will have the go type from which ever globally mapped or inferred type there is for a TIMESTAMP.
The user can specify a go type which is incompatible with the underlying pg type. The type conversion will fail at runtime. (Similar to a bad type map provided globally).

The intermediate unmarshal type idea is interesting but indeed out of scope. Here is a few thoughts that might help tho: Using a combination of query specific type map on the return return: (once I've implemented it) and ROW() to return a "single column" and map it to a single custom made struct which implements something like the BinaryDecoder and TextDecoder interface and contains all the unmarshalling logic, it should be possible to unmarshall into pretty much any thing with little to no work on the pggen side. And it makes the config easier: No need to provide an intermediary type. The drawback is that the user here might have to write some structs for the result manually and do the whole decoding logic by hand.

jschaf commented 2 years ago

Nice, detailed writeup!

Agreed that we probably need multi-line with this change. Couple tweaks:

For ease of parsing, I'm in favor of using a pggen prefix to identify metadata lines. Otherwise, we'd need to do something like find the last instance of --name and require all lines after it begin with pggen-approved prefix.
I think columnType reads better than return. Then for consistency, arg becomes argType.

-- pggen: name: GetDailyFlightsFromAircraft :many
-- pggen: argType: day=github.com/0xjac/custom-project/types.Day
-- pggen: columnType: departure=time.Time

I think you nailed the precedence. Enforcing order of argType and columnType is a good thing to enforce early.

I approve of the design with the naming changes (arg -> argType and return -> columnType) and pggen prefix on comment lines. Happy to debate if you feel strongly otherwise.

Sounds like you have a reasonable handle on how to approach the implementation but let me know if a high-level breakdown would help. From my end, I'll be looking closely at:

not breaking backwards compatibility
testing coverage

0xjac commented 2 years ago

@jschaf Overall it all seems good. There are just a few things to note:

For ease of parsing, I'm in favor of using a pggen prefix to identify metadata lines. Otherwise, we'd need to do something like find the last instance of --name and require all lines after it begin with pggen-approved prefix.

No problem with using a prefix, then the algorithm would be going through each line with a pggen prefix and parse them from first to last. The annotations should still be one by line and in the order: name, argType, columnType.
Just one question about it: Do we allow "comment" lines without the pggen: prefix in-between or after the lines with the prefix?
In your example you wrote:
```
-- pggen: name: GetDailyFlightsFromAircraft :many
```
But this breaks compatibility because of the new prefix. Therefore, I propose we support a name: line both with or without the pggen: prefix. This would work as follow:
1. Find the first line with the pggen: prefix.
2. If that line is a name: line, then parse it and move on. (Any name: line above without prefix is ignored, as is currently the case.)
3. If that line is not a name: prefix, parse the previous line expecting a name : without the pggen: prefix. Then move on to parsing the current line as an argType: or columnType:
Unlike the current annotation where name: lines can be repeated and the last one is taken into account, for lines with the pggen: name: prefix, it can only appear once as the first line with the pggen: prefix.

Regarding tests, I will obviously provide tests with this PR. I'm still in the playground/specs phase now but once that is done, I'll start adding the required tests.

jschaf commented 2 years ago

Do we allow "comment" lines without the pggen: prefix in-between or after the lines with the prefix?

I think not. Easier to start strict and allow flexibility later.

But this breaks compatibility because of the new prefix. Therefore, I propose we support a name: line both with or without the pggen: prefix.

Agreed, we're aligned there.

Regarding tests, I will obviously provide tests with this PR. I'm still in the playground/specs phase now but once that is done, I'll start adding the required tests.

Sweet, sounds like a plan.

jschaf / pggen

feat: go type mapping per query argument #73