mauricio / postgresql-async

Async, Netty based, database drivers for PostgreSQL and MySQL written in Scala
Apache License 2.0
1.43k stars 222 forks source link

Bulk inserts with Prepared Statements #121

Open jimmydivvy opened 9 years ago

jimmydivvy commented 9 years ago

Currently when using prepared statements to insert bulk records, postgresql-async appears to only be able to insert one record at a time - this isn't ideal for large resultsets.

Ideally it would be nice to have an API along the lines of the JDBC executeBatch() call that can send a batch of inserts in one request.

Is this possible, or intentionally not included for technical reasons? (I'm happy to do up a pull-request for this functionality, just want to make sure I'm not going down a rabbit hole that cannot be solved).

dylex commented 9 years ago

I think to do this efficiently you'd want to add support for postgres's CopyIn protocol, which would take some work, but is not impossible. Otherwise, you're back to doing Bind, Execute, which is exactly what happens now. It could theoretically be done with fewer round-trips (without waiting for the responses) but that has some risks and no performance advantage server-side. See http://www.postgresql.org/docs/current/static/protocol-flow.html#PROTOCOL-COPY

You could also just stick all the insert statements into a simple query with semicolons, but it's not clear that's any better either.

mauricio commented 9 years ago

@dylex is right here, you can technically build a nicer interface for bulk inserts, but the actual driver implementation won't change much and there won't be a performance gain, unless you build it with the COPY feature.

The main performance boost in these cases for JDBC implementations is that they make sure the prepared statement is cached as most of them require a couple query executions before actually caching it, which wouldn't make much of a difference here since we always cache the prepared statement here.

jilen commented 8 years ago

@mauricio How do you think abount mysql jdbc connector's rewriteBatchedStatements ? It will rewrite batch insert into single multi row insert thus leads to less table lock and index update. Will mysql-async driver support it ?

david-bouyssie commented 7 years ago

Here is an alternative solution for bulk COPY using the binary format (best performance): https://github.com/bytefish/PgBulkInsert

I don't know if a bridge between postgresql-aync and PgBulkInsert would be possible but it would be awesome !