Open mrdulin opened 4 years ago
Your SQL string should have placeholders like $1
and $2
. You shouldn't be sanitizing values directly.
I've built a number of helpers for this type thing. For a few examples, see https://github.com/jackc/pgxutil and https://github.com/jackc/pgsql.
In particular, since your example is a generic insert builder, check out: https://github.com/jackc/pgxutil/blob/c7a206b68e6233635504c92af4667c7ee8d43126/pgxutil.go#L585
You're welcome to import and use those libraries directly, but neither has had an official release and may be abandoned or changed without notice -- so you may rather clone whatever functionality you want or use it as a guide to build your own helpers.
Hey Jack,
Is there ever a chance of bulk-insert functioning making its way into pgx officially? It feels like bulk-insertion is probably a really common use-case.
I myself started to build a SQL string manually like mrdulin did in the OP after reading #82 (which implied that this kind of thing was impossible), so I am glad to stumble upon this thread! I took a look at pgxutil.Insert()
, but it is a little unclear. Can you provide a quick example of using pgxutil.Insert()
for bulk inserting, say, 3 things? Thank you.
pgxutil.Insert()
only inserts one row at a time. But pgsql.InsertStatement
and pgsql.ValuesStatement
can be used to build a multi-row insert statement. But as I mentioned above, neither of these packages have officially supported releases. At this point they are basically my personal toolkit that I update or change as my own projects dictate. So it is better to think of them as example code that you can copy from rather than as a library to rely on.
But as far as pgx itself, if you need to efficiently insert multiple records you have 2 options. The fastest is to use CopyFrom()
. The simplest is to use SendBatch()
to send multiple queries at once.
And of course it is possible -- albeit a little awkward -- to build multi-row insert SQL and arguments manually. See https://github.com/jackc/tpr/blob/66a04f276a8478a6e5ba9747790b834b6b965efe/backend/data/item.go#L205 for an example of doing it the hard way.
This is what I came up with, maybe it will be useful to future readers of this thread:
// getBulkInsertSQL is a helper function to prepare a SQL query for a bulk insert.
//
// For example:
//
// SQLString = "INSERT INTO notes (thing_a, thing_b) VALUES %s"
// rowValueSQL = "?, ?"
// numRows = 3
//
// Would be transformed into:
//
// INSERT INTO notes (thing_a, thing_b)
// VALUES
// ($1, $2),
// ($3, $4),
// ($5, $6)
//
// Also see:
// https://stackoverflow.com/questions/12486436/how-do-i-batch-sql-statements-with-package-database-sql
func getBulkInsertSQL(SQLString string, rowValueSQL string, numRows int) string {
// Combine the base SQL string and N value strings
valueStrings := make([]string, 0, numRows)
for i := 0; i < numRows; i++ {
valueStrings = append(valueStrings, "("+rowValueSQL+")")
}
allValuesString := strings.Join(valueStrings, ",")
SQLString = fmt.Sprintf(SQLString, allValuesString)
// Convert all of the "?" to "$1", "$2", "$3", etc.
// (which is the way that pgx expects query variables to be)
numArgs := strings.Count(SQLString, "?")
SQLString = strings.ReplaceAll(SQLString, "?", "$%v")
numbers := make([]interface{}, 0, numRows)
for i := 1; i <= numArgs; i++ {
numbers = append(numbers, strconv.Itoa(i))
}
return fmt.Sprintf(SQLString, numbers...)
}
// getBulkInsertSQLSimple is a helper function to prepare a SQL query for a bulk insert.
// getBulkInsertSQLSimple is used over getBulkInsertSQL when all of the values are plain question
// marks (e.g. a 1-for-1 value insertion).
// The example given for getBulkInsertSQL is such a query.
func getBulkInsertSQLSimple(SQLString string, numArgsPerRow int, numRows int) string {
questionMarks := make([]string, 0, numArgsPerRow)
for i := 0; i < numArgsPerRow; i++ {
questionMarks = append(questionMarks, "?")
}
rowValueSQL := strings.Join(questionMarks, ", ")
return getBulkInsertSQL(SQLString, rowValueSQL, numRows)
}
Here's an example of using the simple function:
func (*ChatLog) BulkInsert(chatLogRows []*ChatLogRow) error {
SQLString := `
INSERT INTO chat_log (user_id, message, room)
VALUES %s
`
numArgsPerRow := 3
valueArgs := make([]interface{}, 0, numArgsPerRow*len(chatLogRows))
for _, chatLogRow := range chatLogRows {
valueArgs = append(valueArgs, chatLogRow.UserID, chatLogRow.Message, chatLogRow.Room)
}
SQLString = getBulkInsertSQLSimple(SQLString, numArgsPerRow, len(chatLogRows))
_, err := db.Exec(context.Background(), SQLString, valueArgs...)
return err
}
And here's an example of using the non-simple function:
func (*GameParticipantNotes) BulkInsert(gameParticipantNotesRows []*GameParticipantNotesRow) error {
SQLString := `
INSERT INTO game_participant_notes (
game_participant_id,
card_order,
note
)
VALUES %s
`
numArgsPerRow := 5
valueArgs := make([]interface{}, 0, numArgsPerRow*len(gameParticipantNotesRows))
for _, gameParticipantNotesRow := range gameParticipantNotesRows {
valueArgs = append(
valueArgs,
gameParticipantNotesRow.GameID,
gameParticipantNotesRow.UserID,
gameParticipantNotesRow.CardOrder,
gameParticipantNotesRow.Note,
)
}
valueSQL := `
(SELECT id FROM game_participants WHERE game_id = ? AND user_id = ?),
?,
?
`
SQLString = getBulkInsertSQL(SQLString, valueSQL, len(gameParticipantNotesRows))
_, err := db.Exec(context.Background(), SQLString, valueArgs...)
return err
}
@Zamiell amazing, your bulk insert code worked with only minor adjustments. Thanks!
Faster 5x for generation bulk sql statement:
func getBulkInsertSQL(table string, columns []string, rowCount int) string {
var b strings.Builder
var cnt int
columnCount := len(columns)
b.Grow(40000) // Need to calculate, I'm too lazy))
b.WriteString("INSERT INTO " + table + "(" + strings.Join(columns, ", ") + ") VALUES ")
for i := 0; i < rowCount; i++ {
b.WriteString("(")
for j := 0; j < columnCount; j++ {
cnt++
b.WriteString("$")
b.WriteString(strconv.Itoa(cnt))
if j != columnCount-1 {
b.WriteString(", ")
}
}
b.WriteString(")")
if i != rowCount-1 {
b.WriteString(",")
}
}
return b.String()
}
@mirecl is b.Reset() necessary? isn't gc supposed to clean b
after function execution?
@mirecl is b.Reset() necessary? isn't gc supposed to clean
b
after function execution?
You're absolutely right. Fixed the code
Faster 5x
This seems like a pretty misleading thing to say without showing actual benchmarks. The vast majority of the execution time seems likely to be on the actual database insertion, not the string building.
Faster 5x
This seems like a pretty misleading thing to say without showing actual benchmarks. The vast majority of the execution time seems likely to be on the actual database insertion, not the string building.
I agree with you, I meant fast 5x only for gen sql string building.
This is what I came up with, maybe it will be useful to future readers of this thread:
// getBulkInsertSQL is a helper function to prepare a SQL query for a bulk insert. // // For example: // // SQLString = "INSERT INTO notes (thing_a, thing_b) VALUES %s" // rowValueSQL = "?, ?" // numRows = 3 // // Would be transformed into: // // INSERT INTO notes (thing_a, thing_b) // VALUES // ($1, $2), // ($3, $4), // ($5, $6) // // Also see: // https://stackoverflow.com/questions/12486436/how-do-i-batch-sql-statements-with-package-database-sql func getBulkInsertSQL(SQLString string, rowValueSQL string, numRows int) string { // Combine the base SQL string and N value strings valueStrings := make([]string, 0, numRows) ...
I've readCopyFrom
won't generate autogenerated id. If I use a Batch of Insert statements instead of one Insert statement how will it affect performance? Could it even hinder or slow down query optimization?
Is the provided snippet from @Zamiell the recommended way to do a bulk insert today?
My recommendation is to use CopyFrom
whenever possible for bulk insert. IDs will be auto generated, but there is no way to return them to the client application.
If that's not possible then I recommend using a batch with separate insert statements for each row. This is usually easier to work with than a single statement that inserts multiple rows. A single statement can be a bit faster, but IMO it's not worth the hassle.
Here's a sample from the pgx benchmark suite:
BenchmarkWrite100RowsViaInsert-12 476 2528141 ns/op 19689 B/op 613 allocs/op
BenchmarkWrite100RowsViaMultiInsert-12 1888 617472 ns/op 285941 B/op 1573 allocs/op
BenchmarkWrite100RowsViaBatchInsert-12 1728 703070 ns/op 102517 B/op 530 allocs/op
BenchmarkWrite100RowsViaCopy-12 3253 372422 ns/op 58755 B/op 355 allocs/op
I write some functions for building SQL string, should I call
pgx.Identifier.Sanitize
method explicitly to sanitize the identifiers? Or, just callstrings.Join(cols, ",")
andpgx
will sanitize the column Identifier internally? Need clarification, thanks. E.g.sql.go
:Unit tests:
sql_test.go
: