Performance of batch inserts

dawinaj commented 1 year ago

Hello, I am looking to migrate from another SQL wrapper. However, for my app the speed is really important, as it downloads data from the internet, parses and writes to DB. And it turned out that when I refactored it (for safety) to use prepared statements instead of raw string concatenation, the performance dropped drastically. So, any benchmarks maybe, or a word on how such implementation compares to "regular" ones?

rbock commented 1 year ago

Hi,

You did not give me a whole lot of data here, but let me try to answer with a few statements (sorry for stating the obvious in some places):

Line by line insert is usually pretty slow, but it depends on the actual database and the table definition, e.g. indexes, etc.
A single-use prepared statement is usually slower than a single direct execution.
Once prepared, the (repeated) use of a prepared statement is often faster than a direct execution. But if the prepared statement still is a single line insert, then the performance will likely be dominated by slow insert performance.
Batch inserts are a good compromise, in my view. The combine the ease of direct execution and (usually) higher performance. Obviously, the performance is dominated by the database itself.
Since sqlpp11 operates with expressions instead of string representations, it provides considerable type safety, and it takes care of escaping.

I understand that this is not exactly the information you asked for, but I hope it helps anyway?

Best, Roland

dawinaj commented 1 year ago

I understand, I am still a newbie when it comes to databases (or maybe even programming in general lol). Yes, in the beginning I was doing batch inserts by creating commands as raw strings (up to 100 rows at once, depending on the downloaded data). After rewriting the code to use prepared statement, even though it is reused all the time, it only sends rows one by one. So as you stated in the 3rd point, the problem with performance is probably caused by the overhead of a query. I would use a batch prepared statement if the amount of data was not varying... But maybe I should implement some buffer... well, w/e. From what I've seen in (at least a bit of) the code, I see that the queries are built by writing from expressions to a string stream? That's probably the slowest part, but I guess when your entire library is based on such principle, it is most likely already well optimized. It should be even possible to make it constexpr, though probably only partially...

Well, thanks for a clarification, and for extrapolating from incomplete data :)

Edit: Is it possible in this library to pass the values into prepared statement in a loop?

rbock commented 1 year ago

I understand, I am still a newbie when it comes to databases (or maybe even programming in general lol).

Enjoy :-)

I would use a batch prepared statement if the amount of data was not varying... But maybe I should implement some buffer...

Maybe. I suggest to do experiments first, to see if that's really likely to solve anything.

I see that the queries are built by writing from expressions to a string stream? That's probably the slowest part

Within the library: yes.

but I guess when your entire library is based on such principle, it is most likely already well optimized.

Oh, I am sure this could be faster. This is not optimized at all. OTOH, the library has been in use in the payments industry for almost a decade now, and afaict, performance of this library has never been an issue. And a colleague once told me that they actually did some comparison with a "regular" library (as you called it). sqlpp11 was considerably faster. Thus, as above with the buffering: Yes, some performance improvements are certainly possible, but do measurements and experiments first.

It should be even possible to make it constexpr, though probably only partially...

Partially: Sure. However, the whole stream is known at compile time. It should give the compiler a lot of room for optimization already.

Well, thanks for a clarification, and for extrapolating from incomplete data :)

Happy to.

Edit: Is it possible in this library to pass the values into prepared statement in a loop?

Parameters are essentially represented as a tuple, which is not ideal for iteration in a loop.

Multi-line inserts can be filled in a loop, though.

dawinaj commented 1 year ago

Yes, you're right about the compiler being able to optimize by itself. And yes, I get it about the tuple, it's the same in the other library I'm using currently :) Though it is nice it's possible with multiline inserts. Alright, thank you for all the assistance. I think I can close now.

rbock / sqlpp11

Performance of batch inserts #476