Open delanoe opened 9 years ago
Postgresql-simple does support the COPY FROM STDIN
interface, see the Copy
module. You don't have to read stuff in from a file to use this interface, but you'll have to generate the data syntax yourself as there's no support for that. You could even wrap this up into a conduit if you'd like.
Alternatively, you can use a combination of transactions, prepared statements, and/or executeMany. A single call to executeMany is not appropriate for bulk inserts, because this will generate the entire insert as a single DML command, which isn't good for the backend, not to mention that postgresql-simple's generation of extremely large queries is not as efficient as it should be. Batching a smaller number of rows into a call to executeMany might still be benefitial, though.
One thing that people doing bulk inserts often miss is that running multiple insert statements in a single transaction is essential for performance and helps with correctness as well. So if you choose to eschew the copy interface, you will have to use transactions.
More reading regarding bulk inserts that might be benefitial:
https://github.com/lpsmith/postgresql-simple/issues/67#issuecomment-114037553 https://stackoverflow.com/questions/20169372/postgresql-copy-method-enter-valid-entries-and-discard-exceptions
Many thanks for your answer. I will try and do some benchmarks.
@lpsmith out of curiousity, would you accept a PR with support for formatting data for COPY FROM STDIN
? I've got some private code that's implemented it already.
I dunno if @joeyadams is still active in the community, but he did have some code for generating data syntax for COPY FROM STDIN
. I certainly wouldn't be opposed to the concept of adding such functionality to postgresql-simple, but I'd have to see your code before I make a final decision of course.
@iand675 I can test your code if you want to since we do not have any answer from @joeyadams yet.
Yeah, I've been super swamped lately. Once life gets back to normal I'll need to extract the code from a private project... it may be a few weeks at least.
thats is very nice to you @iand675 ! Still here to help if needed, since you have something already
Well, Joey's code is still available here: lpsmith/postgresql-libpq#3
I am lost in the discussion. I can test it. How would you write the bulk insert solution ?
I suppose it is this part: https://github.com/joeyadams/haskell-libpq/commit/f9ffdf109f5a87f5981dddd13f84adae521c5b1b
Finally, I think that "COPY FROM" could (maybe) do the trick if instead of a file, we "emulate" the file itself. It works well with copy_from from psycopg using : https://github.com/psycopg/psycopg2/blob/4e92322d7498fec1d48411836aaa981e968cd7f5/psycopg/cursor_type.c#L1338
Then maybe Database.Postgresql.Simple.Copy (doCopy) would help us ? Do you have an example how to use it (I am still working on the comments but if you had an advice ) ?
There is a way to do a bulk insert with postgresql-simple ? Many inserts are known to be less performant than COPY for instance. The copy function can load a file but can we use it to insert many (really many) values (maybe with conduit?) ? What solution do you advice for such need ?