Support for custom tags in struct

imraan-go commented 3 years ago

I've been using sqlc for quite sometime. But I want to migrate to pggen since pgx library is faster than libpq. But is it possible to implement something similar to this : https://github.com/kyleconroy/sqlc/issues/534

I want to add custom tags like "json" or "validate" in the generated struct. In that way, we can reuse struct generated by pggen when binding json to struct and marshaling struct to json. Below is an example use case:

CREATE TABLE user (
  id   BIGSERIAL PRIMARY KEY,

  -- tags:`validate:"min=10,max=0"`
  username text NOT NULL,

  --tags: `json:"-"`
  password text NOT NULL 

  -- tags:`validate:"gt=18"`
  age  integer NOT NULL,

);

So in the above example we can use the struct generated by pggen to bind json directly to this struct and use Validate module to validate, Also use the same struct to send JSON reponse to user without password field since we are using custom tags for JSON. Unlike sqlc, pggen uses actual database server to get information about tables. Is this possible to implement this feature? Like using --schema-glob file or some custom config files ?

jschaf commented 3 years ago

I'm unlikely to add support for a few reasons:

The most important constraint is that tags would require a "real" SQL parser to extract comments in CREATE TABLE statements. The current parser is only capable of parsing queries ending with a semicolon. pggen uses a really limited grammer of strings, query fragments and comments. Adding a real SQL parser would be neat but it's a lot of complexity.

It's not always clear to figure out how a query column maps to an underlying table. As a pathological case, consider:

SELECT 
  CASE 
    WHEN random() > 0.5 THEN (SELECT foo from table_x LIMIT 1) 
    ELSE (SELECT foo from table_y LIMIT 1)
  END as foo;

I'm hesitant to add anything conditional to generated code to avoid expanding the testing surface. Long term, I think the only realistic approach is to allow custom configuration via an AST hook that allows you to arbitrarily modify the AST of pggen generated code..

I don't understand the use case of validation on the underlying table. I would usually use a CHECK constraint instead like so:

CREATE TABLE user (
  id   BIGSERIAL PRIMARY KEY,

  -- tags:`validate:"min=10,max=0"`
  username text NOT NULL CHECK (length(username) <= 10),

  --tags: `json:"-"`
  password text NOT NULL,

  -- tags:`validate:"gt=18"`
  age  integer NOT NULL CHECK (age >= 18)

);

I understand validation on the input parameters so you don't waste a network trip on a query that's going to fail. That's much easier to implement with something like below. Then you could add validation before running the query on the database.

SELECT foo
FROM table_x
WHERE pggen.arg('predicate', validate => 'gt=18');

Could you clarify where and when you want the validation to occur?

imraan-go commented 3 years ago

I'm unlikely to add support for a few reasons:
The most important constraint is that tags would require a "real" SQL parser to extract comments in CREATE TABLE statements. The current parser is only capable of parsing queries ending with a semicolon. pggen uses a really limited grammer of strings, query fragments and comments. Adding a real SQL parser would be neat but it's a lot of complexity.
It's not always clear to figure out how a query column maps to an underlying table. As a pathological case, consider:
SELECT 
  CASE 
    WHEN random() > 0.5 THEN (SELECT foo from table_x LIMIT 1) 
    ELSE (SELECT foo from table_y LIMIT 1)
  END as foo;
I'm hesitant to add anything conditional to generated code to avoid expanding the testing surface. Long term, I think the only realistic approach is to allow custom configuration via an AST hook that allows you to arbitrarily modify the AST of pggen generated code..
I don't understand the use case of validation on the underlying table. I would usually use a CHECK constraint instead like so:
CREATE TABLE user (
  id   BIGSERIAL PRIMARY KEY,

  -- tags:`validate:"min=10,max=0"`
  username text NOT NULL CHECK (length(username) <= 10),

  --tags: `json:"-"`
  password text NOT NULL,

  -- tags:`validate:"gt=18"`
  age  integer NOT NULL CHECK (age >= 18)

); 
I understand validation on the input parameters so you don't waste a network trip on a query that's going to fail. That's much easier to implement with something like below. Then you could add validation before running the query on the database.
SELECT foo
FROM table_x
WHERE pggen.arg('predicate', validate => 'gt=18');
Could you clarify where and when you want the validation to occur?

Relying on SQL query to validate user input is not a realistic approach. Consider the following scenario where an user POSTs a JSON HTTP Request to our API :

u := &schema.CreateUserParams{}
err := c.Bind(u) // Bind incoming JSON data to our struct
if err != nil {
    return c.JSON(http.StatusBadRequest, err.Error())
}
if err = c.Validate(u); err != nil {  // Here we will validate the user input before sending it to database
    return c.JSON(http.StatusBadRequest, "Username should contain more than 10 characters and age should be more than 18")
}
ctx := context.Background()
contactid, err := database.CreateUser(ctx, *u)
if err != nil {
...

With this approach, since the incoming JSON fields and CreateUser fields are same, we can directly bind incoming json data to the struct generated by pggen/sqlc. But since, we are unable use tags in the generated struct, we have to either manually validate data field by field with if else which is time consuming, or copy data from this struct to a custom struct and call c.Validate() . There are lots of validators available for Golang. The above example uses https://github.com/go-playground/validator .

As for custom JSON tag, lets say we have a query like this

SELECT * FROM users WHERE username = pggen.arg('username') ;

This will generate a struct like this

type User struct {
    ID       int64  `json:"id"`
    Username string `json:"username"`
    Password string `json:"password"`
    Age      int32  `json:"age"`
}

And we write code:

ctx := context.Background()
userdetails, err := database.GetUser(ctx, "admin")
if err != nil {
... 
}

if userdetails.password == somecustomlogic("userpass"){
    return c.JSON(http.StatusOK, userdetails)
}
return c.JSON(http.StatusOK, "Invalid Password")

We need the password field from the database. But we don't want to send it to JSON response. By using json tag : json:"-" we can ignore this field in JSON. Those are very basic example but the use cases are pretty common in real world application.

jschaf commented 3 years ago

Relying on SQL query to validate user input is not a realistic approach. Consider the following scenario where an user POSTs a JSON HTTP Request to our API :

Sure, that's what I was trying to clarify. As I understand it: you'd like to validate input parameters and it'd be nice if you could write constraints on the CREATE TABLE statement and have pggen figure out how to map it to input that pggen uses structs. There's still a number of technical challenges:

pggen would have to parse the schema files to get annotations. It currently doesn't.
pggen would need a way for queries to reuse the return struct. That would enable the use case of using a single User struct for all user queries.
pggen would have to figure out if a query maps to an underlying table (with json or validation annotations) and map the query columns to the table columns. This is likely easy-ish in most cases and impossible for the long tail of queries.

I'm not going to do 1 because I don't want to deal with a full SQL parse tree. I could pull in the postgres parser like sqlc. The downside is that there's a lot of complexity to extract useful information, see https://github.com/kyleconroy/sqlc/blob/master/internal/engine/postgresql/parse.go.

I'll probably do 2 relatively soon. I need it because I'm also sick of copying structs.

Since I'm not doing 1, 3 is not possible since we won't have annotations. Even if we did have the annotations, pggen would need control-flow analysis to map output columns to the source column. I think that's the bulk of complexity in sqlc.

For your specific use case, after I implement 2, you'll only have to write struct copying code once. In the meantime, I'd probably hack up the generated file with a sed script to add the annotations manually.

I'm not likely to ever add this feature into the core code. I'd say this belongs in an extension or custom generator on top of pggen. For now, you'll have to settle for sed as your custom generator. Long term, I'd like to add a way to extend pggen programmatically to support custom use-cases without adding more code into the core.

imraan-go commented 3 years ago

pggen would need a way for queries to reuse the return struct. That would enable the use case of using a single User struct for all user queries.

I think instead of using Single user queries for all struct, it would be nice to reuse the return struct when the return columns are same. For example:

CREATE TABLE authors (
  id   BIGSERIAL PRIMARY KEY,
  name text      NOT NULL,
  bio  text
);

-- name: GetAuthor :one
SELECT id,name FROM authors
WHERE id = $1 LIMIT 1;

-- name: ListAuthors :many
SELECT id,name FROM authors
ORDER BY name;

The would generate the following return struct:

type GetAuthorRow struct {
    ID   int64
    Name string
}

type ListAuthorsRow struct {
    ID   int64
    Name string
}

I've taken this code from sqlc but i believe pggen would generate the same since it derived from sqlc. Here we can just reuse the first struct. Are you suggesting to reuse the full table struct for all queries? Wouldn't that be a resource hungry ? like if we need one column from a big table, initializing full table struct will consume extra memory because Golang populates struct variables with default values.

pggen would have to figure out if a query maps to an underlying table (with json or validation annotations) and map the query columns to the table columns. This is likely easy-ish in most cases and impossible for the long tail of queries.

I think I can write a plugin that will take column tags from a text/json file like so :

[
{
"column": "user.password",
"tag": 'json:"-" '
},
{
"column": "user.age",
"tag": 'validate:"gt=18"'
}
]

It will attach when these tags whenever generating struct using this columns.

jschaf commented 3 years ago

Are you suggesting to reuse the full table struct for all queries?

Nope, I agree that's a bad idea and it's not even possible for some queries like computed columns or joined tables. I'd probably add another keyword go-type=AuthorRow to the query file. In your example, that would become:

-- name: GetAuthor :one go-type=AuthorRow
SELECT id,name FROM authors
WHERE id = $1 LIMIT 1;

-- name: ListAuthors :many go-type=AuthorRow
SELECT id,name FROM authors
ORDER BY name;

Both the queries would use the same AuthorRow struct. pggen would check that those queries have the exact same columns, matching the type and name.

Initially, I think pggen will always generate the AuthorRow struct. I'd like eventually to support the ability to use existing structs for output rows. For that case, I think a different keyword would be useful, like go-reuse-type=SomeExistingRow. Since, pggen can't see the existing type, pggen would generate the code without any checks, so the code could fail at runtime. I'm not sure there's a way to fix that.

imraan-go commented 3 years ago

-- name: GetAuthor :one go-type=AuthorRow
SELECT id,name FROM authors
WHERE id = $1 LIMIT 1;

-- name: ListAuthors :many go-type=AuthorRow
SELECT id,name FROM authors
ORDER BY name;
Both the queries would use the same AuthorRow struct. pggen would check that those queries have the exact same columns, matching the type and name.

I think manual overriding like above should do the trick for now. In future, some automatic implementation is needed. But for the time being, I think it's totally fine. Also why not use a separate config file like sqlc instead of appending everything to command line? This will make sharing project easier with custom pggen config instead of creating seperate bash file containing pggen args.

jschaf commented 3 years ago

why not use a separate config file like sqlc instead of appending everything to command line?

I think persistent configuration is better solved with a Makefile or a similar build system like Bazel. I have a similar preference to esbuild. A config file is just another interface to the command line flags.

jschaf / pggen

Support for custom tags in struct #24