Closed hebime closed 2 years ago
would you know if there's a difference on the DB side performance wise?
It's almost certainly faster to insert multiple rows in a single insert since you avoid all the transaction overhead but I have no idea how much faster.
Using a variable number of VALUES
expressions requires dynamic SQL which something I want but haven't figured how to go about it. However, there few options that work as detailed below:
Schema
CREATE TABLE users (
username text,
password text,
created_at timestamptz
);
This is the approach I've been using. Seems like the most straight-forward way to me.
query.sql
-- name: InsertMany :many
WITH
input AS (
SELECT username, password, created_at
FROM unnest(
pggen.arg('usernames')::bigint[],
pggen.arg('passwords')::text[],
pggen.arg('created_ats')::timestamptz[]
) t(username, password, created_at)
)
INSERT INTO users (username, password, created_at)
SELECT username, password, created_at
FROM input
RETURNING *;
This generates:
type InsertManyParams struct {
Usernames []int
Passwords []string
CreatedAts pgtype.TimestamptzArray
}
type InsertManyRow struct {
Username string `json:"username"`
Password string `json:"password"`
CreatedAt time.Time `json:"created_at"`
}
// InsertMany implements Querier.InsertMany.
func (q *DBQuerier) InsertMany(ctx context.Context, params InsertManyParams) ([]InsertManyRow, error) {}
If you're inserting all the columns of a table you can take advantage of the fact that each table is also a composite type. So for your user
table, there's also a user
composite type. I've renamed to users since user
is a reserved keyword and IntelliJ doesn't like it.
NOTE: this approach fail if you have any generated columns.
query.sql
-- name: InsertManyValues :many
INSERT INTO users
SELECT *
FROM unnest(pggen.arg('all_users')::users[])
RETURNING *;
Which generates:
// Users represents the Postgres composite type "users".
type Users struct {
Username string `json:"username"`
Password string `json:"password"`
CreatedAt time.Time `json:"created_at"`
}
type InsertManyValuesRow struct {
Username string `json:"username"`
Password string `json:"password"`
CreatedAt time.Time `json:"created_at"`
}
// InsertManyValues implements Querier.InsertManyValues.
func (q *DBQuerier) InsertManyValues(ctx context.Context, allUsers []Users) ([]InsertManyValuesRow, error) {}
Let's say created_at has a default value that we're happy with but we want to use a composite type instead of an array for each column. Our schema changes to:
CREATE TABLE users (
username text,
password text,
created_at timestamptz DEFAULT now()
);
First, we need a composite type of the row we want to insert:
CREATE TYPE partial_user AS (
username text,
password text
);
query.sql
-- name: InsertPartial :many
INSERT INTO users (username, password)
SELECT *
FROM unnest(pggen.arg('partial_users')::partial_user[])
RETURNING *;
Which generates:
// PartialUser represents the Postgres composite type "partial_user".
type PartialUser struct {
Username string `json:"username"`
Password string `json:"password"`
}
type InsertPartialRow struct {
Username string `json:"username"`
Password string `json:"password"`
CreatedAt time.Time `json:"created_at"`
}
// InsertPartial implements Querier.InsertPartial.
func (q *DBQuerier) InsertPartial(ctx context.Context, partialUsers []PartialUser) ([]InsertPartialRow, error) {}
Out of curiosity, what are you using pggen for? It's fun to see what other folks use it for.
If you run into any problems let me know. I know option 1 will work but I haven't personally tried option 2 or 3 and I'm not sure if test coverage extends to passing in arrays of composite types as input.
Thanks for the detailed response. I think the options you laid out would work for my use case. I didn't even know about the table composite types so that's pretty cool.
It's almost certainly faster to insert multiple rows in a single insert since you avoid all the transaction overhead but I have no idea how much faster.
After I wrote the question, I remembered that the batch automatically starts a transaction so yes, there would definitely be overhead there. Still nice that the batch avoids all the network round trips after the first though. I could theoretically use a batch here since my query is in a transaction anyways, but this is good to know for cases where I don't need a transaction.
Out of curiosity, what are you using pggen for? It's fun to see what other folks use it for.
We've actually switched to using pggen to generate the boilerplate for our DAL in our code base. We used to write everything manually by hand but we found that your tool works very well. We don't like using ORMs so we appreciate that we can still write our queries and have all the tedious (and error prone) type scanning and query execution be handled by the generated code.
As for the dynamically generated SQL, I've written some functions in the past that have solved this specific use case when it comes to inserting multiple values at once. If you like, I can try to open a PR with a POC. Off the top of my head, here is how it could work:
For syntax maybe something like this:
-- name: InsertMany :exec
INSERT INTO users ('username', 'password', 'created_at')
VALUES [](pggen.arg('username'), pggen.arg('password'), pggen.arg('created_at'));
-- or
VALUES pggen.array(pggen.arg('username'), pggen.arg('password'), pggen.arg('created_at'));
-- or
VALUES ...(pggen.arg('username'), pggen.arg('password'), pggen.arg('created_at'));
This would generate the following:
const numInsertManyParamFields = 3
const insertManySQL = `INSERT INTO users ('username', 'password', 'created_at') VALUES %s;`
type InsertManyParams struct {
Username string
Password string
CreatedAt pgtype.Timestamptz
}
func (p InsertManyParams) flattenValues() []interface{} {
values := make([]interface{}, 0, numInsertManyParamFields)
return append(values, p.Username, p.Password, p.CreatedAt)
}
func (q *DBQuerier) InsertMany(ctx context.Context, params []InsertManyParams) (cmdTag, error) {
// generate values SQL
valueSQL := make([]string, 0, len(params))
for row := 0; row < len(params); row++ {
value := strings.Builder{}
value.WriteString("(")
for col := row*numInsertManyParamFields + 1; col <= (row+1)*numInsertManyParamFields; col++ {
value.WriteString(fmt.Sprintf("$%d", col))
if col != (row+1)*numInsertManyParamFields {
value.WriteString(",")
}
}
value.WriteString(")")
valueSQL = append(valueSQL, value.String())
}
SQL := fmt.Sprintf(insertManySQL, strings.Join(valueSQL, ","))
// generate values []interface{}
values := make([]interface{}, 0, numInsertManyParamFields*len(params))
for _, param := range params {
values = append(values, param.flattenValues()...)
}
ctx = context.WithValue(ctx, "pggen_query_name", "InsertMany")
cmdTag, err := q.conn.Exec(ctx, SQL, values...)
if err != nil {
return cmdTag, fmt.Errorf("exec query InsertMany: %w", err)
}
return cmdTag, err
}
Here is the implementation in a go playground so you can see how it works. The exec func just prints out the generated SQL. The pgtypes have all been replaced with base types for simplicity: https://play.golang.org/p/CcezMTKZQmh Please keep in mind the generate valuesSQL part is a bit messy, but I'm sure it can be cleaned up. The second loop to generate the interface values could also be combined with the first but this is more readable for demonstration purposes. This is just an example to show how it could potentially work.
One thing of note with this approach is that postgres has a max amount of parameters that can be bound to using a prepared statement. I believe the max is 65355
because the protocol uses an Int16
to pass the number of parameters (see Bind: https://www.postgresql.org/docs/current/protocol-message-formats.html). If necessary, a check can be added though since we know how many rows and fields are being passed in.
I think the options you laid out would work for my use case. I didn't even know about the table composite types so that's pretty cool.
One of the fringe benefits of writing a Go-Postgres mapping is you learn the Postgres type system in gory detail.
We've actually switched to using pggen to generate the boilerplate for our DAL in our code base. We used to write everything manually by hand but we found that your tool works very well
Glad to hear it! That's pretty much why I built it. I like SQL but didn't like doing the mapping to Go.
As for the dynamically generated SQL, I've written some functions in the past that have solved this specific use case when it comes to inserting multiple values at once. If you like, I can try to open a PR with a POC. Off the top of my head, here is how it could work:
One thing I've tried to avoid is extending Postgres syntax for pggen. In our codebase, we declare pggen.arg as a function that returns any
so it means it highlights correctly in IntelliJ (and presumably other editors).
-- This schema file exists solely so IntelliJ doesn't underline every
-- pggen.arg() expression in squiggly red.
CREATE SCHEMA pggen;
-- pggen.arg defines a named parameter that's eventually compiled into a
-- placeholder for a prepared query: $1, $2, etc.
CREATE FUNCTION pggen.arg(param text) RETURNS any AS '';
In general, I'm okay with requiring clients to use more cumbersome hacks like the options above for things that can be done in Postgres instead of pggen. SQL has a bunch of awkward constructs that look like values but you can't use them for prepared queries, like the VALUES
constructor and the x IN (1, 2, 3)
expression but there are workarounds for both. The rationale is to keep the number of codepaths low and keep pggen relatively easy to hack on.
For dynamic queries that you can't do in Postgres, I'm interested in adding support to pggen. I'll probably add something like pggen.predicate
first and follow up with pggen.ident
for dynamic columns and group-bys (group-bys are a can of worms though since changing the group-by can make the query invalid). If you're interested, I'll let you know when I do the design doc for either one.
I started a tracking bug for dynamic predicates: https://github.com/jschaf/pggen/issues/41
I'll close this since there's a reasonable workaround that doesn't require any extra syntax and focus on the dynamic predicates.
Hello,
Is it possible to generate an insert func that can insert multiple rows at once? Ex:
I understand that there's a batch API available but that still executes N queries on the DB vs just one with multiple value rows. At that point, would you know if there's a difference on the DB side performance wise?
Thanks!