Closed marianadsalgueiro closed 4 months ago
Please try to add useBulkCopyForBatchInsert=true
in connection string.
Thanks for the update!
The execution time has gotten a bit better but is still slower compared to compatibiliy level 110.
CSV File Size | Execution Time | SQL Server Version |
---|---|---|
166MB | 10min15s | 2012 |
166MB | 1h24min40s | 2019 |
166MB | 41min40s | 2019 (new connection string) |
New connection string: jdbc:sqlserver://localhost:1433;databaseName=XXX;selectMethod=cursor;encrypt=true;trustServerCertificate=true;useBulkCopyForBatchInsert=true;
Do you have any documentation on why useBulkCopyForBatchInsert=true
helps (at least a little)?
Is there anything else you recommend?
CSV files are generated for this program every day with different sizes, and I'm concerned that larger files will result in significantly longer execution times.
Thanks again.
You can read about this property here. It enables usage of bulk copy API, when prepared statement is used. If you have that possibility, you can try to play with bulk copy API directly and check if batch size property in SQLServerBulkCopyOptions
will allow you to achieve better performance.
I can't explain the performance impact between the SQL versions, have you checked with the SQL Server team?
Can you give us the results when comparing SQL Server 2012 to 2019, when using the same driver version and same compatibility level?
Additionally, can you supply us with the execution plan between the two runs, we can see if there are any major differences that would explain the performance impact.
We could also take a look at the driver logs (though the above execution plan and test results are more helpful).
For improving performance in general, here are our tips:
prepareMethod=prepare
The performance impact, while using bulk copy for batch insert, could also be because of metadata not being cached correctly (see https://github.com/microsoft/mssql-jdbc/pull/2464). You can confirm this for yourself by trying the 12.7.1 preview (which has the fix for proper caching). The next stable release to include this fix will be our 12.8.0 GA scheduled for July 31.
If you are using batch inserts without bulkcopy, look into Prepared Statement Metadata caching.
Thanks for the feedback!
I'm currently working on some tests based on your suggestions and will get back to you as soon as possible.
Hi! After a while (and lots of testing), I've finally found the problem.
Here are the results on different driver versions and compatibility levels: | CSV File Size | Compatibility Level | Driver | Execution Time |
---|---|---|---|---|
166MB | 110 | sqljdbc4 4.2 | 10min15s | |
166MB | 150 | sqljdbc4 4.2 | 1h29min49s | |
166MB | 110 | mssql-jdbc 12.6.2.jre8 | 10min | |
166MB | 150 | mssql-jdbc 12.6.2.jre8 | 1h24min40s | |
166MB | 110 | mssql-jdbc 12.7.1.jre8-preview | 11min25s | |
166MB | 150 | mssql-jdbc 12.7.1.jre8-preview | 1h5min41s |
As we can see, all driver versions on compatibily level 150 are affected.
While running SQL Server Profiler on both compatibility levels, I observed that for a specific code and date ('7985', '20160701'), the duration of the DELETES got significantly worse.
As Jeffery requested the execution plan between the two runs, I remembered that I had once enabled Query Store on SQL Server 2019 (at compatibility level 150) while the program was running and noticed that the plan was changing mid-run.
At compatibility level 110, the plan did not change.
While researching, I found that starting from SQL Server 2016, the Database Engine changed, and Trace Flag 2371 is enabled by default. This changes the threshold used by the auto-update statistics process to a dynamic threshold. So I decided to check if statistics were changing mid-run, explaining why DELETEs duration got worse and why the plan was changing.
And voilá. So basically, updated statistics are generating new plans, and those plans are worse. I just don't understand why they're worse... but ok.
So, it has nothing to do with the driver but with the database engine's changes.
Anyway, thank you!
Very interesting, thank you for looking into this. We'll go ahead and close this issue, but I'm hoping in anyone else in the future has this same issue they come across this thread.
Question
I have this java program that reads a CSV file and updates a table (performing deletes and inserts) in batches using
java.sql.PreparedStatement.addBatch
andjava.sql.PreparedStatement.executeBatch
. I noticed a significant performance drop when migrating from SQL Server 2012 to SQL Server 2019 and changing the database compatibility level from 110 to 150.It appears that while using the database in compatibility level 110, batches are trully being generated, but in compatibility level 150, deletes/inserts are being made line by line? Does that make sense?
Since this java program is a little bit old, we were still using sqljdbc4 4.2. We tried migrating to mssql-jdbc 12.6.2.jre8, but the problem still persists.
The connection string is:
jdbc:sqlserver://localhost:1433;databaseName=XXX;selectMethod=cursor;encrypt=true;trustServerCertificate=true;
Is there anything missing on the connection string? Is this a known problem?