lpsmith / postgresql-simple

Mid-level client library for accessing PostgreSQL from Haskell
Other
206 stars 71 forks source link

bulk insert ? #157

Open delanoe opened 9 years ago

delanoe commented 9 years ago

There is a way to do a bulk insert with postgresql-simple ? Many inserts are known to be less performant than COPY for instance. The copy function can load a file but can we use it to insert many (really many) values (maybe with conduit?) ? What solution do you advice for such need ?

lpsmith commented 9 years ago

Postgresql-simple does support the COPY FROM STDIN interface, see the Copy module. You don't have to read stuff in from a file to use this interface, but you'll have to generate the data syntax yourself as there's no support for that. You could even wrap this up into a conduit if you'd like.

Alternatively, you can use a combination of transactions, prepared statements, and/or executeMany. A single call to executeMany is not appropriate for bulk inserts, because this will generate the entire insert as a single DML command, which isn't good for the backend, not to mention that postgresql-simple's generation of extremely large queries is not as efficient as it should be. Batching a smaller number of rows into a call to executeMany might still be benefitial, though.

One thing that people doing bulk inserts often miss is that running multiple insert statements in a single transaction is essential for performance and helps with correctness as well. So if you choose to eschew the copy interface, you will have to use transactions.

lpsmith commented 9 years ago

More reading regarding bulk inserts that might be benefitial:

https://github.com/lpsmith/postgresql-simple/issues/67#issuecomment-114037553 https://stackoverflow.com/questions/20169372/postgresql-copy-method-enter-valid-entries-and-discard-exceptions

delanoe commented 9 years ago

Many thanks for your answer. I will try and do some benchmarks.

iand675 commented 9 years ago

@lpsmith out of curiousity, would you accept a PR with support for formatting data for COPY FROM STDIN? I've got some private code that's implemented it already.

lpsmith commented 9 years ago

I dunno if @joeyadams is still active in the community, but he did have some code for generating data syntax for COPY FROM STDIN. I certainly wouldn't be opposed to the concept of adding such functionality to postgresql-simple, but I'd have to see your code before I make a final decision of course.

delanoe commented 9 years ago

@iand675 I can test your code if you want to since we do not have any answer from @joeyadams yet.

iand675 commented 9 years ago

Yeah, I've been super swamped lately. Once life gets back to normal I'll need to extract the code from a private project... it may be a few weeks at least.

delanoe commented 9 years ago

thats is very nice to you @iand675 ! Still here to help if needed, since you have something already

lpsmith commented 8 years ago

Well, Joey's code is still available here: lpsmith/postgresql-libpq#3

delanoe commented 8 years ago

I am lost in the discussion. I can test it. How would you write the bulk insert solution ?

I suppose it is this part: https://github.com/joeyadams/haskell-libpq/commit/f9ffdf109f5a87f5981dddd13f84adae521c5b1b

delanoe commented 8 years ago

Finally, I think that "COPY FROM" could (maybe) do the trick if instead of a file, we "emulate" the file itself. It works well with copy_from from psycopg using : https://github.com/psycopg/psycopg2/blob/4e92322d7498fec1d48411836aaa981e968cd7f5/psycopg/cursor_type.c#L1338

Then maybe Database.Postgresql.Simple.Copy (doCopy) would help us ? Do you have an example how to use it (I am still working on the comments but if you had an advice ) ?