rails-sqlserver / tiny_tds

TinyTDS - Simple and fast FreeTDS bindings for Ruby using DB-Library.
Other
605 stars 191 forks source link

Insert multiple rows in a single command #362

Open ehansen486 opened 7 years ago

ehansen486 commented 7 years ago

I am using Mac and ubuntu. I am using the 1.3.0 version of tiny_tds. I am using freetds v1.00.40.

I want to insert many rows in a single insert statement.

In MySQL, it's possible to just post an insert statement with many (..), (..) stanzas after the values keyword and it tends to figure things out pretty efficiently. SQL*Server supports this syntax as well, but it is very very slow.

I read about sp_prepare, and the Java JDBC driver for SQLServer seems to leverage that, judging by the source code. I tried crafting a giant insert prepared statement (1000 rows 100 columns kept SQL*Server's CPU pegged). I still can't figure out how to efficiently post multiple rows for the same insert statement.

Anybody have any pointers on how to do this via the tiny_tds library?

Thanks! Eric

metaskills commented 7 years ago

Hey Eric! Did you find out anything since posting? It sounds like you question is pure T-SQL/Server related. I did a Google search and found this:

https://stackoverflow.com/questions/2624713/how-do-i-insert-multiple-rows-without-repeating-the-insert-into-dbo-blah-part

Is that what you needed?

ehansen486 commented 7 years ago

Thanks! Yeah that works, and that's what I did first. I use the same method on MySQL and it works fine. But it's extremely slow in SQL*Server for some reason.

What ultimately worked for me is to exec sp_prepare and exec sp_execute. That way you prepare the SQL statement, and then just pass in the parameters over and over again. That is much faster. Unfortunately, I've not been able to find a way to do this in SQL*Server; you can concatenate multiple commands with semicolons and that works, but is a similar performance to sending them separately.

--> Eric

metaskills commented 7 years ago

Ah interesting. I remember back in the day before I discovered EXEC sp_executesql I had to use DBCC FREEPROCCACHE WITH NO_INFOMSGS to free up SQL Sever's cache plan when things got slow. Have you tried using sp_executesql? That is what the Rails adapter uses.

ehansen486 commented 7 years ago

The advantage of sp_prepare/sp_execute over sp_executesql is that you don't have to send the SQL and parameter declaration for each row. You 'prepare' it once, and then only pass in the data for the parameters for each row.

metaskills commented 7 years ago

Cool... but if it does not work and sp_executesql does then what you are really trying to say is that bytes sent on the wire is the high bar for performance. Is that true?

ehansen486 commented 7 years ago

Well, it's bytes plus load on SQL*Server. With sp_executesql, it has a little extra work to do each insert to identify the existing execution plan to re-use based on the SQL you pass in. That's avoided with prepare/execute. You prepare once, then execute many times. Could have an impact with lots of rows inserted into the same table.

From what I can glean from the Java SQLServer driver, there's a way to 'batch' inserts rather than send them individually. Then SQLServer can handle inserting many records at once rather than one at a time, which is more efficient. That's the only part I'm missing.

metaskills commented 7 years ago

That's the only part I'm missing.

I think I'm missing more... like specifically how TinyTDS can do this when we use DBLIB?

SQLServer can handle inserting many records at once rather than one at a time, which is more efficient

Do you have statics and benchmarks that specifically show differences between pure T-SQL sp_executesql and the process you are talking about? TIA.

ehansen486 commented 7 years ago

Hi-

I don't know how to do batching without perhaps using a different library or language.

No, I didn't do a side-by-side comparison, but: a. with sp_executesql, you're sending more bytes over the wire b. SQL*Server needs to identify the execution plan rather than be told which prepared statement to use

So given a choice between implementing one over the other, I'm going to go with prepare/execute. And I got it to work

metaskills commented 7 years ago

Thanks, it would be super helpful to understand perf more with measured data. Do you have a blog post that someone else has done on this? Anything would be interesting to me.

ehansen486 commented 7 years ago

No, I don't have a good handle on relative performance. I was surprised to find relatively little documentation about this on the internet. Batch inserting is a relatively well-known pattern.

Oh, and I can tell you that:

  1. Using prepare/execute vs. just execute with a monolithic insert statement with many values concatenated together, I see a 12X increase of speed
  2. I use a commercial tool sometimes to convert data from MySQL to SQL*Server. It's Full Convert from Spectral Core. It does the same insert job in about 1/10 of the time that I can in ruby with prepare/execute, and I confirmed with the author that it does not use multi-threading. So there is still room to speed this up. But the current level of performance is sufficient for me for now.
metaskills commented 7 years ago

Cool, thanks. I did a write up with some simple benchmarks using sb_executesql.

https://blog.engineyard.com/2011/sql-server-10xs-faster-with-rails-3-1

What would be really cool is to understand if Ruby ODBC or some apples to apples comparison showed better results in a way that would help make Ruby/TinyTDS/SQLServer community better understand. Without it, it is hard to tell what I can do, if anything. Thanks so much for sharing what you have!

aharpervc commented 7 years ago

What is the actual (example) sql you're running in both cases? If I understand your point, it's that something like:

declare @handle int
exec sp_prepare @handle output, '@a int, @b int', 'insert (a,b) values (@a, @b)'

exec sp_execute @handle, 1, 2
exec sp_execute @handle, 2, 3
exec sp_execute @handle, 3, 4

is 12x faster than:

exec sp_executesql 'insert (a,b) values (1,2), (2,3), (3,4)'
ehansen486 commented 7 years ago

Yeah, that's pretty much it, except:

  1. I didn't use sp_executesql, I just used execute with a concatentation of values. This is what I'm used to using with MySQL. I'm not sure what the syntax is or whether your example would allow reusing of execution plans. If that works, it would be really convenient. The page I read talked about substitution params: https://technet.microsoft.com/en-us/library/ms175170(v=sql.105).aspx. It looked similar enough to the prepared statement approach, that I decided to go that way.
  2. I was using an insert with 50+ columns and 1000 records.
  3. I ran this for some 50K rows, so 50 inserts. I imagine that SQL*Server's analyzer would speed up the subsequent 49 inserts if it could reuse the execution plan per your example.

It would be interesting to do a benchmark test. I didn't pursue the sp_executesql approach. It seemed to me that sending an explicit pointer to the prepared statement would be better.

aharpervc commented 7 years ago

Are you sending your sp_execute calls in a single batch? Or like in a loop? Or a few at a time? How does that part work?

BTW, you can just say SQL Sever (without the *) so markdown formatting doesn't get confused here on github

ehansen486 commented 7 years ago

Well, that's really what I'd love to understand better about SQLServer! There doesn't seem to be an actual batch command in SQLServer that works with prepared statements. I'm really sending them individually. I tried concatenating a group of 1000 sp_execute statements by separating them with semicolons, and it made no observable difference in terms of performance.

There should be a way that I can make one sp_execute call, specify the prepared statement @handle and send some sort of array of values. But I couldn't figure out how to do that.

I also tried creating a giant prepared statement with separate variables for each column repeated 1000 times, but SQL*Server couldn't even compile that; just pegged the CPU. Didn't seem like a great idea anyway ;) so don't try that.

metaskills commented 6 years ago

There should be a way that I can make one sp_execute call, specify the prepared statement @handle and send some sort of array of values. But I couldn't figure out how to do that.

Not with TSQL that I know of. I think that is what a stored proc is for.

I also tried creating a giant prepared statement with separate variables for each column repeated 1000 times, but SQL*Server couldn't even compile that; just pegged the CPU. Didn't seem like a great idea anyway ;) so don't try that.

LOL, so yea, maybe a stored proc is best here.

I don't think there is anything in TinyTDS that would be at play here. However, if there were, I would like to know about it. If someone found the time to make a simple Rails app that could use an ENV var to demonstrate the issue between PG and SQL Server, that would be hellakewl.

metaskills commented 6 years ago

Maybe related, see this thread/post. https://github.com/rails-sqlserver/tiny_tds/issues/360#issuecomment-309252072