Closed dxh9845 closed 4 months ago
One other note: though the Godoc comments suggest you should call .Close
on an appender rather than .Flush
, the perforance appears roughly the same to me either way.
Hi @dxh9845, Thanks for the bug report. I could not reproduce this with random data (see below). Please provide a complete self-contained example, including the table schema and the row data, or any self-contained reproduction with dummy data.
This is my attempt at reproducing your issue.
func main() {
AddLargeAmountOfRows(context.Background())
}
func AddLargeAmountOfRows(ctx context.Context) error {
start := time.Now()
connector, err := duckdb.NewConnector("", nil)
_, err = sql.OpenDB(connector).Exec(`CREATE TABLE row_pairs (c1 BIGINT, c2 BIGINT, c3 BIGINT)`)
driverConn, _ := connector.Connect(ctx)
defer driverConn.Close()
appender, _ := duckdb.NewAppenderFromConn(driverConn, "", "row_pairs")
defer appender.Close()
// Append to the table
for i := 0; i < 996000; i++ {
err = appender.AppendRow(int64(rand.IntN(996000)), int64(rand.IntN(996000)), int64(rand.IntN(996000)))
if err != nil {
fmt.Println(err)
}
}
// Explicitly flush the appender before closing
// THIS is the part that hangs.
err = appender.Flush()
if err != nil {
fmt.Println(err)
}
elapsed := time.Since(start)
log.Printf("finished, took %s", elapsed)
return nil
}
Output.
2024/07/14 17:03:40 finished, took 352.199916ms
Thanks for the response. I figured it out - my existing table had ART indices that were slowing performance significantly. My solution was to drop the indices and rebuild after the bulk data was finished.
Great to hear. That's usually our recommendation for indexes, as row-based index key insertions are slow. Any insertions happening post-index creation are currently implemented as row-based index key insertions, also when using the appender. For FKs, we're improving their performance (see here). However, the fastest way to build indexes will always be bulk creation after table creation.
Hello, Thanks for this library! I am seeing very slow performance for inserting rows with the Appender API. When using the Append API to append pairs - about 995,247 rows individual - I see extreme slow down when calling
.Flush()
.Version:
github.com/marcboeker/go-duckdb v1.7.0
DuckDB Library Version: v1.0.0In my debugging, I see the code specifically is slow around these lines. In my rough timing, it takes around 13 minutes for a Macbook Pro M1, but I've also experienced this in a production Kubernetes cluster with more processing power.
Please let me know how I can help debug or improve performance.