jschaf / pggen

Generate type-safe Go for any Postgres query. If Postgres can run the query, pggen can generate code for it.
MIT License
282 stars 26 forks source link

Mapping SQL types to slices and/or pointers of imported types is broken #43

Closed 0xjac closed 2 years ago

0xjac commented 2 years ago

Mapping an SQL type to a pointer and/or a slice of a go type which must be imported results in invalid generated go code as pggen is confused by the slice and pointer markers ([] and *).

Minimally reproducible example

pggen Version

$ pggen version
pggen version 2021-09-07, commit 7f87888

Gen Command

pggen gen go -query-glob "queries/*.sql" \
    --postgres-connection "postgres://postgres@localhost/tit?sslmode=disable" \
    --go-type "timestamp=*time.Time" \
    --go-type "_timestamptz=[]time.Time" \
    --go-type "_timestamp=[]*time.Time" \
    --go-type "_interval=[]util/custom/times.Interval" \
    --go-type "_date=util/custom/[]times.Date" 

Queries

-- name: GetNilTime :one
SELECT pggen.arg('data')::TIMESTAMP;

-- name: GetTimes :many
SELECT * FROM UNNEST(pggen.arg('data')::TIMESTAMP WITH TIME ZONE[]);

-- name: GetNilTimes :many
SELECT * FROM UNNEST(pggen.arg('data')::TIMESTAMP[]);

-- name: GetCustomIntervals :many
SELECT * FROM UNNEST(pggen.arg('data')::INTERVAL[]);

-- name: GetCustomDates :many
SELECT * FROM UNNEST(pggen.arg('data')::DATE[]);

Generated File

// Code generated by pggen. DO NOT EDIT.

package queries

import (
    // --snip--

    "time"
    "util/custom/[]times"
    "util/custom/times"
)

// --snip--
type Querier interface {
    GetNilTime(ctx context.Context, data time.*Time) (time.*Time, error)
    // --snip--

    GetTimes(ctx context.Context, data time.[]Time) ([]pgtype.Timestamptz, error)
    // --snip--

    GetNilTimes(ctx context.Context, data time.[]*Time) ([]time.*Time, error)
    // --snip--

    GetCustomIntervals(ctx context.Context, data times.[]Interval) ([]pgtype.Interval, error)
    // --snip--

    GetCustomDates(ctx context.Context, data []times.Date) ([]pgtype.Date, error)
    // --snip--
}

Expected

// Code generated by pggen. DO NOT EDIT.

package queries

import (
    // --snip--

    "time"
    "util/custom/times"
)

// --snip--
type Querier interface {
    GetNilTime(ctx context.Context, data *time.Time) (*time.Time, error)
    // --snip--

    GetTimes(ctx context.Context, data []time.Time) ([]time.Time, error)
    // --snip--

    GetNilTimes(ctx context.Context, data []*time.Time) ([]*time.Time, error)
    // --snip--

    GetCustomIntervals(ctx context.Context, data []times.Interval) ([]times.Interval, error)
    // --snip--

    GetCustomDates(ctx context.Context, data []times.Date) ([]times.Date, error)
    // --snip--
}

The two main issues are the miss-handling of the [] and * markers and the failure to correctly map the return values for:

For custom types, I placed the slice marker ([]) at different position, each result in the correct placement either as the argument type or the import and fails at the other location.


As a side note, (tho this might be worth a separate issue if you think it's worth it) I used different SQL types for each queries as example since I cannot map a single SQL type to multiple Go types. It would be nice to be able to further specify a type mapping per query argument and for the return value. I.e. I can map overall timestamp=time.Time but for a query I could add a comment to the query to map a specific argument to a different type, like:

-- name: GetNilTime :one *time.Time
-- param: data *time.Time
SELECT pggen.arg('data')::TIMESTAMP;
jschaf commented 2 years ago

Great write up! Imports have been one of the more difficult things to get right. That and composite types.

I can repro. I’ll take a look in the next couple days.

jschaf commented 2 years ago

As a side note, (tho this might be worth a separate issue if you think it's worth it)

Yep, please open another issue so it’s easy for folks to vote on. If you can, add some context on the use case where it’s helpful to have a query specific types. I place a high bar on additional options.

0xjac commented 2 years ago

@jschaf thanks for the reply. I started writing a tool similar to pggen a long time ago but never got around to have a working mvp. In any case imports are tricky and there is one more issue with them which I failed to showcase in my example and it is that of packages whose package name is not matching the end of the module import path. This happens mostly in two cases:

  1. Packages with a version above one like pgx which is imported as github.com/jackc/pgx/v4 but the package name is still pgx, not v4.
  2. Packages which just have a different name than their import path, because of a go-prefix or a vanity URL, like github.com/pelletier/go-toml which has as package name toml or gopkg.in/yaml.v2 which is named yaml

I am not sure how to resolve the package name from the file, there might be some way finding the install path of the package on the local machine, finding the go files and parsing the package line on the source files. But that's complicated...

Here is the alternative I came up with at the time, which is to specify both the import path and package names as separate values. Adapted to the way arguments are mapped in pggen, it could look something like this:

package main

import (
    "fmt"
    "regexp"
)

const (
    // Group 0 is the entire match (i.e. the entire string)
    groupSQLType = 1 // Name of the SQL type as detected by pggen.
    // Group 2 is the import group with the trailing comma to separate it from the package and type.
    groupImport = 3 // Import line required by the mapped Go type.
    groupGoType = 4 // Go type with, including the slice/array/pointer marker and package name.
    // Group 5 is the slice/array/pointer marker.
    groupPkgName = 6 // Go package name with a trailing dot.
    // Group 7 is the type name without the package.
)

const (
    // sqlTypeRE matches an SQL identifier as describe in
    // https://www.postgresql.org/docs/8.1/sql-syntax.html#SQL-SYNTAX-IDENTIFIERS
    sqlTypeRE = `^(?P<sqlType>[A-Za-z_][\w\$]*)`
    // importRE matches an import spec which can be any unicode char
    // See: https://golang.org/ref/spec#Import_declarations
    importRE = `(?P<import>.+)`
    // markerRE matches the "marker" of either:
    //  - a slice ([])
    //  - an array with a size as number literal ([<number>])
    //  - a pointer (*)
    //  - a slice of pointers ([]*)
    //  - an array with a size as number literal of pointers ([<number>]*).
    markerRE = `(\[\d*\]\*|\[\d*\]|\*)?`
    // pkgRE matches a Go pkg name as defined in https://golang.org/ref/spec#Package_clause.
    pkgRE = `([\pL_][\pL\pN_]*\.)?`
    // gType matches an optional slice and/or pointer marker, an optional pkg name, and a
    goTypeRE = `(?P<gotype>` + markerRE + pkgRE + `([\pL_][\pL\pN_]*))`
)

var argRE = regexp.MustCompile(sqlTypeRE + `=(` + importRE + `,)?` + goTypeRE)

func main() {
    args := []string{
        // base type
        // (backwards compatible with --go-type in pggen version 2021-09-07, commit 7f87888)
        "text=string",       // plain type
        "text=*string",      // pointer
        "_text=[]string",    // slice
        "_text=[42]string",  // array
        "_text=[]*string",   // slice of b pointer
        "_text=[42]*string", // array of pointer

        // type from stdlib
        "timestamp=time,time.Time",       // plain type
        "timestamp=time,*time.Time",      // pointer
        "_timestamp=time,[]time.Time",    // slice
        "_timestamp=time,[42]time.Time",  // array
        "_timestamp=time,[]*time.Time",   // slice of b pointer
        "_timestamp=time,[42]*time.Time", // array of pointer

        // local package type
        "myclass=local/util/pkg,pkg.Class",       // plain type
        "myclass=local/util/pkg,*pkg.Class",      // pointer
        "_myclass=local/util/pkg,[]pkg.Class",    // slice
        "_myclass=local/util/pkg,[42]pkg.Class",  // array
        "_myclass=local/util/pkg,[]*pkg.Class",   // slice of b pointer
        "_myclass=local/util/pkg,[42]*pkg.Class", // array of pointer

        // module type
        "text=github.com/jschaf/pggen,pggen.Lang",       // plain type
        "text=github.com/jschaf/pggen,*pggen.Lang",      // pointer
        "_text=github.com/jschaf/pggen,[]pggen.Lang",    // slice
        "_text=github.com/jschaf/pggen,[42]pggen.Lang",  // array
        "_text=github.com/jschaf/pggen,[]*pggen.Lang",   // slice of b pointer
        "_text=github.com/jschaf/pggen,[42]*pggen.Lang", // array of pointer

        // module type with version above 1
        "row=github.com/jackc/pgx/v4,pgx.Row",       // plain type
        "row=github.com/jackc/pgx/v4,*pgx.Row",      // pointer
        "_row=github.com/jackc/pgx/v4,[]pgx.Row",    // slice
        "_row=github.com/jackc/pgx/v4,[42]pgx.Row",  // array
        "_row=github.com/jackc/pgx/v4,[]*pgx.Row",   // slice of b pointer
        "_row=github.com/jackc/pgx/v4,[42]*pgx.Row", // array of pointer

        // module/pkg names mismatch
        "tomlpos=github.com/pelletier/go-toml,toml.Position",       // plain type
        "tomlpos=github.com/pelletier/go-toml,*toml.Position",      // pointer
        "_tomlpos=github.com/pelletier/go-toml,[]toml.Position",    // slice
        "_tomlpos=github.com/pelletier/go-toml,[42]toml.Position",  // array
        "_tomlpos=github.com/pelletier/go-toml,[]*toml.Position",   // slice of b pointer
        "_tomlpos=github.com/pelletier/go-toml,[42]*toml.Position", // array of pointer

        // type from stdlib
        "timestamp=time,time.Time",       // plain type
        "timestamp=time,*time.Time",      // pointer
        "_timestamp=time,[]time.Time",    // slice
        "_timestamp=time,[42]time.Time",  // array
        "_timestamp=time,[]*time.Time",   // slice of b pointer
        "_timestamp=time,[42]*time.Time", // array of pointer

        // invalid import for base type
        // (backwards compatible with --go-type in pggen version 2021-09-07, commit 7f87888)
        "text=strings,string",       // plain type
        "text=strings,*string",      // pointer
        "_text=strings,[]string",    // slice
        "_text=strings,[42]string",  // array
        "_text=strings,[]*string",   // slice of b pointer
        "_text=strings,[42]*string", // array of pointer

        // missing import

        // local package type
        "myclass=pkg.Class",       // plain type
        "myclass=*pkg.Class",      // pointer
        "_myclass=[]pkg.Class",    // slice
        "_myclass=[42]pkg.Class",  // array
        "_myclass=[]*pkg.Class",   // slice of b pointer
        "_myclass=[42]*pkg.Class", // array of pointer

        // module type
        "text=pggen.Lang",       // plain type
        "text=*pggen.Lang",      // pointer
        "_text=[]pggen.Lang",    // slice
        "_text=[42]pggen.Lang",  // array
        "_text=[]*pggen.Lang",   // slice of b pointer
        "_text=[42]*pggen.Lang", // array of pointer

        // module type with version above 1
        "row=pgx.Row",       // plain type
        "row=*pgx.Row",      // pointer
        "_row=[]pgx.Row",    // slice
        "_row=[42]pgx.Row",  // array
        "_row=[]*pgx.Row",   // slice of b pointer
        "_row=[42]*pgx.Row", // array of pointer

        // module/pkg names mismatch
        "tomlpos=toml.Position",       // plain type
        "tomlpos=*toml.Position",      // pointer
        "_tomlpos=[]toml.Position",    // slice
        "_tomlpos=[42]toml.Position",  // array
        "_tomlpos=[]*toml.Position",   // slice of b pointer
        "_tomlpos=[42]*toml.Position", // array of pointer

    }

    var (
        matches                              []string
        sqlType, importLine, pkgName, goType string
    )
    for _, arg := range args {
        matches = argRE.FindStringSubmatch(arg)

        if len(matches) != 8 {
            fmt.Printf("invalid go type mapping: %s\n", arg)
            continue
        }

        sqlType = matches[groupSQLType]
        importLine = matches[groupImport]
        pkgName = matches[groupPkgName]
        goType = matches[groupGoType]

        if pkgName == "" && importLine != "" {
            fmt.Printf("invalid import: %s for type %s (%s)\n", importLine, goType, arg)
            continue
        }

        // Detect missing import statements for a given package.
        if pkgName != "" && importLine == "" {
            // Print [:len(pkgName)-1] of the package name since the last character is always the
            // period (.) between the package name and value (if the package is present).
            fmt.Printf("missing import for package: %s (%s)\n", pkgName[:len(pkgName)-1], arg)
            continue

            // Note, one alternative is to build a map of package name to import line.
            // For any package name as the key, set the import line as the value.
            // - If the package name is not in the mapping, add it, even if the import line is
            //   missing (empty string).
            // - If the package name is already present with a missing import line,
            //   update the import line
            // - If the package name is already present, but the import lines differ, return an
            //   error since there are two identical package names. (The only way to handle this
            //   would be to support specifying aliases for package names, potentially by modifying
            //   the regexp).
        }

        fmt.Printf("%s -> %s", sqlType, goType)

        if importLine != "" {
            fmt.Printf(" (import %s)", importLine)
        }

        fmt.Println()
    }
}

playground

This is a bit long because of the test cases, but the interesting bits are the regexp at the top and how it is used at the bottom. The regexp builds on the existing format of pggen, namely: <sql-type>=<go-type> to support an optional import line and markers for the slice/array/pointers. It looks like <sql-type>=[<import>,]<go-type> The go-type is further split into: [<marker>][<package>.]<declared-go-type>. I wrote it this way because I did not managed to have optional groups without actually having groups, and to allow to extract the package and check that if an import is specified, the type comes from a package and vice-versa.

Some drawbacks of this approach:

  1. For multiple types of the same package, the import line must be repeated. I actually like this because it forces the user to be explicit about imports which matches the Go philosophy. This can be avoided by keeping track of packages and imports in a map as explained in the snippet above. This remove the repetition but if the type mapping including the import is removed, the generation will/should fail.
  2. There are no checks that the import (module) is the right one for the package name. If the wrong import or package name is given, invalid go code will be generated. I don't think something can be done about this. A fix for this would remove the need to specify both the module and the package. I think it is also acceptable since this will should be caught at compile time and thus should not blow up in production. (Unless there are two modules with the exact same package name and API, but different behaviors). And it is reasonable to expect devs to provide the right modules and package names. Similar (and safer!) to them providing a valid mapping from SQL to Go types. (If the wrong mapping is given, like text to int, the query might actually fail at runtime!).

The other solution I thought of at the time was to only specify the go type with the package and then provide a separate list of required packages to import. However I decided against it as I felt it was an inferior approach for the following reasons:

  1. Since we don't know which import is for which package, we must add all the imports to all files even if a file has a query which does not use a type and its related import. This results in unused imports!
  2. While it avoids repeating the modules to import (see drawback 1 above), it requires maintaining both the type mapping and a separate list of imports. It could be easy for those to get out of sync and having to go back and forth to fix them until the generated go code becomes valid with the right imports.

A bit of a mouthful, but since I actually spent some time on this, I hope it can help save you some time :wink:

I'll open a separate issue for the type mapping per query. And the suggest approach and regex is valid and can be reused for the type mapping per query parameter by reusing "(" + importRE + ",)?" + goTypeRE :wink:

0xjac commented 2 years ago

@jschaf another issue, specific to pggen, which I just noticed is that the file containing the Querier interface must import all modules even those of queries in other files since the Querier interface contains the signature of all queries.

Hopefully it should be as simple as going over the entire map and adding all the imports for that file, but it's something to keep in mind :wink:

0xjac commented 2 years ago

@jschaf Thank you very much for the fix and for releasing it in 2022-05-22. I'm currently updating the [AUR package]. However I noticed there are no assets (except the source code) in the the latest release.

Could I trouble you to release the binaries with the fix? Thanks a lot.

jschaf commented 2 years ago

The action releaser is all screwed up but I released manually and it worked.

0xjac commented 2 years ago

I saw that, thanks I'll test it tonight.