Closed JDBC-0 closed 3 months ago
Hi @JDBC-0,
Do you have a repro for us to reproduce the issue on our end? If not, are you able to tell us how you achieved these results? From our internal testing, JDBC performance has remained consistent across the last few versions, so we are curious what is being done differently in the situation you describe.
As I described above: we triggered some processing in our big application to have a test case that is not too short and not too long and isolated 1 task that takes about 1 Minute. We repeated the exact same test 5 times (warming up, caches, etc.) and ended up with a minimum and maximum duration time.
We executed the exact same test in the exact same environment again and again by just stopping everthing, exchange the driver files (.jar, .dll) and started our test again 5 times (1st result thrown away). Same for different driver versions. Also reverted back and forth between driver versions with consistent results. In the end we got:
V 12.4.1 : 00:42 - 00:50 V 12.4.2 : 00:49 - 00:54 V 12.5.0 : 01:35 - 01:47 V 12.6.0 : 01:40 - 01:47 V 12.6.3 : 01:45 - 02:02
and a clear indication that the runtime since V 12.5.0 more than doubled! (Switchuing back tu 12.4.2 ... old results with half the execution time).
This is a process in Java that does heavy calculations, some database calls and in the end it spans several threads that each saves results into the database with their own separat database connection (each storing data in its own exclusive table). The storage phase is some kind of DELETE + INSERT and is done with prepared statements (of database stored procedure) that are called in batches (of 1000).
So, perhaps there were changes in the driver that slowed down prepared statements and/or batch executions of such prepared statements.
We switched back to 12.4.1 and our big processes went back to normal execution times.
Ok, thank you for the additional information. Are you able to share with us this test case that you ran?
Looking at what you are running, and when the issue started (12.5.0). I think the culprit might be the changes to execute stored procedures directly. In this case, can you try running the same tests you ran for 12.5/12.6 but with useFlexibleCallableStatements
set to false
?
Thanks for the hint "useFlexibleCallableStatements" ... I will be on cvacation now, but will get back latzer.
Have you had a chance to test with the connection string option change? One other thing you can try is running this custom version of the 12.5 driver with the changes for callable statements removed.
Now, that's interesting:
12.5.0 preview : 01:39 - 01:48 12.6.3 + useFlexibleCallableStatements=false : 01:43 - 01:49 12.6.3 + useFlexibleCallableStatements=true: 01:37 - 01:50 your custom 12.5 + useFlexibleCallableStatements=false : 00:41 - 00:44
=> your custom 12.5 is back to old performance, but 12.6.3 + useFlexibleCallableStatements is still twice as slow ... !
BTW: we always use column index to set JDBC values - not column names.
Thanks. It looks like there is a perf issue with the flexible callable statements feature that was added in 12.5.0. Unfortunately, we don't seem to be able to detect this (else it would have been resolved before release). Do you have a repro that you can share with us that produces the results mentioned above?
I'm sorry, this is integrated in a bigger application. It saves thousands of table rows in 5 threads for 5 destination tables with 20 to 80 columns each table with JDBC batches of 1000 rows per batch.
If it's a problem with these "flexible callable statements", why does it still occur if useFlexibleCallableStatements = false is set? I would expect old performance behaviour if this flag is set to false .... ?
I'll have to look into this more to see how this is the case. Perhaps there is some part that is not disabled when the connection string property is set to false. Regardless, the custom version of the driver I sent you was only missing the above feature, so it only makes sense that this is what is causing the performance issues.
Yes. And if the performance problem just occurs for these "flexible callable statement" and we wouldn't need them (as we always set JDBC values by index number not by name), it would be sufficient to get useFlexibleCallableStatements=false working in a way that the old behaviour (and therefore performance) is triggered.
So I hope, that this flag does not cover all of the new code, yet. If you have fixed the scope/coverage of the flag, just send me a new custom version and I will test it with this flag on and off.
BTW: what exactly is FlexibleCallableStatements? I didn't find something in the release notes. I found somthing here: https://learn.microsoft.com/en-us/sql/connect/jdbc/using-a-stored-procedure-with-output-parameters?view=sql-server-ver16 :
"For drivers version 12.6 and above, a new connection property, useFlexibleCallableStatements was introduced. When set to true this property retains the driver's old behavior, allowing users to use a mix of ordinal positions and parameter names when identifying parameters. When set to false, the user must use one or the other, but can not use both."
I browsed through misc code (not the test above) we have and found 1 place where we have such a situation: someone thought that setting values for the same statement sometimes via index and some by name was a good idea ... so, this seems to some kind of "standard working" code that also worked with old JDBC drivers. This leads to the assumption, that "useFlexibleCallableStatements" is some optimization introduced in the MSSQL-JDBC-Driver? So, if this should be some "optimzation", any change to this should not lead to slower performance.
The description for this PR explains it best:
https://github.com/microsoft/mssql-jdbc/pull/2154
An overview is that the original behavior did allow using names and indexes interchangeably. This is why we call them "flexible", and why the property says that this is the same as prior to 12.6. The downside to this was that it used extra metadata queries that impacted performance. So by setting the property to "false", you lose the "flexibility", but gain a performance boost.
The intention was that most users wouldn't see a difference with the feature (as the same behavior is retained by default), but obviously this is not the case. It's too early to say how we're going to resolve this, whether we'll be able to find the performance bug, or whether we'll have to back the feature out in its entirety.
This would be a good opportunity to introduce a performance test for the driver. ;-)
We would prefer to keep the old behaviour: useFlexibleCallableStatements=true to avoid possible problems. But this should lead to EQUAL performance as before.
useFlexibleCallableStatements=false should lead to BETTER or EQUAL performance.
In no case should the performance be (considerably) slower. Well, this flag should have the target to improve performance, right?
Just read in #2154 : "b) Setting/Registering callable statement parameters must be in same order as the sproc definition. Setting the parameters in an adhoc order will not work."
This is a no-go in our code as wo often initiate prepared statements and set fixed parameters one time that have misc parameter indices. Later on, we reuse this statement and just change some of the parameter values - that are not in consequtive order!
=> we would keep the default behaviour (=> useFlexibleCallableStatements=true) to not run into problems. We also use a connection for misc callable statements that are set up in different ways ... too bad, this optimzation is only on the connection and not the statement.
We realize that the current situation is not desired, and are looking into the fixing it so performance either stays the same or is improved by this feature.
Our perf tests are a bit lacking... since it didn't catch this.
But, would it be possible for you to assist us in providing a profile of the driver for your scenario? Some sort of visual that breaks down where cpu time is spent in the application. Just gathering some additional info. No problem if not, since from what you mentioned about your app, might be a big ask.
No problem, just tell me what to do. Perhaps you send me a custom driver with additional debug output?
I did the following test: collect duration times for the JDBC setValue() calls and the batch calls (execute batch, commit, clear) both with drivers 12.4.1 und 12.6.3 and just for a single (smaller) table (not complete test) Results:
A) the setValue() calls took no time (~ 1 ms) with both drivers B) the batch calls took 7 times longer with 12.6.3!
Now the interesting part: we already experimented for the largest table with misc ways to save the table data - with the aim to use the "bulk insert" feature of the driver. If we change the save mode to "batch INSER statements", there is no performance differnce between the drivers. But using save mode "prepared stored procedure statements", the new driver is a lot slower! Again: not during the JDBC setValue() calls but during the JDBC batch commands (execute, commit, clear).
=> conclusion: the problem is visible if you have prepared procedure calls (not prepared INSERT statements)! Both are "callable statements", but one uses prepared stored procedures, the other prepared INSERT statements.
I adjusted the ticket title for better (more specific) problem conclusion
the problem is visible if you have prepared procedure calls (not prepared INSERT statements)!
But using save mode "prepared stored procedure statements", the new driver is a lot slower!
B) the batch calls took 7 times longer with 12.6.3!
Just making sure to get the language clear here. You mean the perf issue arises when you do batch calls with CallableStatements? So in your project you're doing something like the following correct? If so, thanks. That narrows things down.
try (CallableStatement cstmt = connection.prepareCall("{ ? = CALL sproc (?) }")) {
cstmt.setInt();
cstmt.addBatch();
cstmt.setInt();
cstmt.addBatch();
cstmt.executeBatch();
}
Also, if that's the case, what is useBulkCopyForBatchInsert
set to? Is it true or false? Default is false.
Yes, in the end it boils down to the code you mention. Of course there are mor cstmt.set... commands for each batch and after cstmt.executeBatch() we also do a connection.commit() and a cstmt.clearBatch() for the next cycle of
cstmt.set... cstmt.set... ... cstmt.addBatch()
each cycle can add up to 1000 rows/statements per batch, so after 1000 addBatch(), we do
cstmt.executeBatch() connection.commit() cstmt.clearBatch()
(that takes much longer with the new driver) and the next cycle begins.
Our connection string is:
jdbc:sqlserver://{dbSrvAdr:dbSrvPort};databaseName=amc;applicationName=myAppSrv;workstationID=mine;integratedSecurity=true;sendStringParametersAsUnicode=false;encrypt=false;useBulkCopyForBatchInsert=true;sendTemporalDataTypesAsStringForBulkCopy=false;useFlexibleCallableStatements=true
With useFlexibleCallableStatements=true we try to force the old behaviour in driver 12.6.3.
I changed my test to not include clearBatch() and commit() in the runtime statistics: the longer execution time is now restricted to
cstmt.executeBatch()
Using useBulkCopyForBatchInsert=false didn't change anything.
Just an update, still looking into it...
I don't see how batch calls should be affected by these callable statement changes. My profile of what the driver sends to the server shows the same calls being sent when comparing 12.4.1 and 12.6.3 for batch calls with cstmt.
I moved on adding more duration outputs. Overall we save 7 different tables in 7 threads. SO I can compare the table duration results. Here are the number of rows and the durations compared between driver 12.4.1 and 12.6.3:
V 12.4 #number of rows / duration => V 12.6 #number of rows / duration :
Table 1): # 1.190 / 200 ms =># 1.190 / 800 ms Table 2): # 26.600 / 2 sec => # 26.600 / 14 sec
=> even though Table 1 has more columns (and even more different datatypes), duration increases by a factor of 800 / 200 ms = 4 .... while table 2 (with less columns) increases by a factor of 14/2 = 7 ... => the more rows to save, the slower the new driver is.
Also tested increasing batch size (to 30.000) didn't help, so batches of 1.000 or all 26.600 in one batch makes no difference.
Also tried to increase level of "com.microsoft.sqlserver.jdbc" logger but I can't find any performance related values.
😃 @tkyc , @Jeffery-Wasty : .... github code tags don't work properly ... see next comment ... 😞
Results:
V 12.4.1 : Duration batch INSERTs: 642 Duration batch prepared INSERTs: 331 Duration procedure: 7975 Duration batch procedure: 333 ✅
V 12.6.3 Duration batch INSERTs: 604 Duration batch prepared INSERTs: 330 Duration procedure: 8453 Duration batch procedure: 8341 ⚠️
=> as you can see, the last "batch procedure" test takes 8 seconds with 12.6.3 compared to 0,3 seconds with 12.4.1
Test method follows in trailing comment ... The connection URL is set up via
String url = "jdbc:sqlserver://"+dbServer+":"+dbServerPort+";databaseName="+nameDB //base URL
Databse objects in database "pubs":
CREATE TABLE [dbo].[test_JDBC_tab]( [Col_Integer] [int] NULL, [Col_Decimal_19_4] [decimal](19, 4) NULL, [Col_Varchar] varchar NULL, [Col_Date] [date] NULL, [Col_Datetime] [datetime] NULL ) ON [PRIMARY] GO
CREATE OR ALTER PROCEDURE [dbo].[test_JDBC_Ins] ( @inInteger int, @inDecimal194 DECIMAL(19,4), @inVarchar VARCHAR(256), @inDate Date, @inDateTime DateTime ) AS BEGIN INSERT INTO pubs.dbo.test_JDBC_tab (Col_Integer, Col_Decimal_19_4, Col_Varchar, Col_Date, Col_DateTime) VALUES (@inInteger, @inDecimal194, @inVarchar, @inDate, @inDateTime) END -- of procedure test_JDBC_Ins
/**
* Test performance of inserts. Required database objects:
* <ul>
* <li>Table: pubs.dbo.test_JDBC_tab (Int, Decimal(19,4), Varchar(3), Date, Datetime)
* <li>Stored procedures: test_JDBC_Ins
* </ul>
* /
public void testPerformance(final Connection connection) {
try {
final int cycles = 10_000; //number of test data rows to create
//test data:
String tableName = "pubs.dbo.test_JDBC_tab";
final int colInt = 1;
final Double colDec194 = Double.valueOf("123456789.0123");
final String colVarchar3 = "ABC";
final Date colDate = new Date();
final Date colDateTime = new Date();
Statement stmtDel = connection.createStatement();
//batch INSERTs:
stmtDel.execute("delete from "+tableName); //prepare
Statement statement = null;
try {
connection.setAutoCommit(false);
statement = connection.createStatement();
long t = System.currentTimeMillis();
statement.clearBatch();
for (int i = 0; i < cycles; i++) {
statement.addBatch("INSERT INTO "+tableName+" VALUES ("+colInt+","+colDec194+",'"+colVarchar3+"','"+ICUtils.toString(colDate, "yyyyMMdd")+"','"+ICUtils.toString(colDateTime, "yyyyMMdd")+"')");
}
statement.executeBatch();
connection.commit();
System.out.println("Duration batch INSERTs: "+(System.currentTimeMillis() - t));
} catch (final SQLException sqle) {
sqle.printStackTrace();
connection.rollback();
} finally {
connection.setAutoCommit(true);
if (statement != null) statement.close();
}
//batch prepared INSERTs:
stmtDel.execute("TRUNCATE TABLE "+tableName); //prepare
PreparedStatement ps = null;
try {
connection.setAutoCommit(false);
ps = connection.prepareStatement("INSERT INTO "+tableName+" VALUES (?,?,?,?,?)");
long t = System.currentTimeMillis();
ps.clearBatch();
for (int i = 0; i < cycles; i++) {
int c = 0;
ps.setInt(++c, colInt);
ps.setDouble(++c, colDec194);
ps.setString(++c, colVarchar3);
ps.setDate(++c, new java.sql.Date(colDate.getTime()));
ps.setDate(++c, new java.sql.Date(colDateTime.getTime()));
ps.addBatch();
}
ps.executeBatch();
connection.commit();
System.out.println("Duration batch prepared INSERTs: "+(System.currentTimeMillis() - t));
} catch (final SQLException sqle) {
sqle.printStackTrace();
connection.rollback();
} finally {
connection.setAutoCommit(true);
if (ps != null) ps.close();
}
//prepared procedure statement:
stmtDel.execute("TRUNCATE TABLE "+tableName); //prepare
CallableStatement cs = null;
try {
connection.setAutoCommit(false);
cs = connection.prepareCall("{call test_JDBC_Ins(?,?,?,?,?)}");
long t = System.currentTimeMillis();
for (int i = 0; i < cycles; i++) {
int c = 0;
cs.setInt(++c, colInt);
cs.setDouble(++c, colDec194);
cs.setString(++c, colVarchar3);
cs.setDate(++c, new java.sql.Date(colDate.getTime()));
cs.setDate(++c, new java.sql.Date(colDateTime.getTime()));
cs.executeUpdate();
}
connection.commit();
System.out.println("Duration procedure: "+(System.currentTimeMillis() - t));
} catch (final SQLException sqle) {
sqle.printStackTrace();
connection.rollback();
} finally {
connection.setAutoCommit(true);
if (cs != null) cs.close();
}
//iprepared procedure statement batch:
stmtDel.execute("TRUNCATE TABLE "+tableName); //prepare
ps = null;
try {
connection.setAutoCommit(false);
ps = connection.prepareCall("{call test_JDBC_Ins(?,?,?,?,?)}");
long t = System.currentTimeMillis();
ps.clearBatch();
for (int i = 0; i < cycles; i++) {
int c = 0;
ps.setInt(++c, colInt);
ps.setDouble(++c, colDec194);
ps.setString(++c, colVarchar3);
ps.setDate(++c, new java.sql.Date(colDate.getTime()));
ps.setDate(++c, new java.sql.Date(colDateTime.getTime()));
ps.addBatch();
}
ps.executeBatch();
connection.commit();
System.out.println("Duration batch procedure: "+(System.currentTimeMillis() - t));
} catch (final SQLException sqle) {
sqle.printStackTrace();
connection.rollback();
} finally {
connection.setAutoCommit(true);
if (ps != null) ps.close();
}
} catch (final Throwable t) {
t.printStackTrace();
}
System.out.println("End of test testPerformance().");
}//testPerformance()
@JDBC-0
Thanks, really appreciate the additional info and code. Taking a look...
@JDBC-0
I believe I understand why the perf problem is happening now... I'm trying to figure out a solution now.
@JDBC-0
Can you try this driver jar out? This seems to improve the perf on my end.
@tkyc : I had to use the DLL from the offifcal 12.8.0 but it worked. Yup, the results look good:
V 12.6.3 Duration INSERTs: 7946 Duration INSERT batch: 560 Duration INSERT batch prepared: 356 Duration procedure prepared: 6288 Duration procedure prepared batch: 6443 ⚠️
V 12.8.0 Duration INSERTs: 7826 Duration INSERT batch: 564 Duration INSERT batch prepared: 353 Duration procedure prepared: 6297 Duration procedure prepared batch: 293 ✅
It also doesn't change when setting useFlexibleCallableStatements=false - great work! 👍 What was the reason?
@JDBC-0
What was the reason?
In the old driver when you call a cstmt once, it gets "prepared" and we receive a handle which we reuse in subsequent calls which improve perf. In the few latest driver versions, because we're executing cstmts directly, we can't "prepare" and reuse the statement. I was reading through the driver's TDS spec manual and it seems SQL BATCH TDS call would be the most appropriate in this scenario. However, for now, I fell back on the old logic for batch calls. Also, after discussing with the team we still might revert this feature temporarily to do additional work on it since many issues relating to it have popped up.
But anyways, thanks for the help and patience.
As a "reward" for me ... 😁 ... could you confirm that the bulk insert feature of the driver now supports DATE/DATETIME? Because I read contrary texts : sometimes I read that datatypes like DATE, DATETIME are not supported but I see release notes entries that "fixed" e.g. Timestamp in bulk inserts. One of the main reasons we like to update the driver is this bulk insert feature but most of our tables have DATE and DATETIME columns. Is this supported now? Thanks! 😄
@JDBC-0
Unfortunately, SQL data and datetime data types aren't supported right now for bulk inserts.
@tkyc : just for info: the latest 12.8.1 driver now performs similar to 12.4.1 .... => issue solved. ✅
Awesome, thanks for the feedback. There will also be a 12.6.4 hotfix as well some time this week.
We upgraded from 12.4.1 (over 12.6.1, 12.6.2 that we skipped due to bugs) to 12.6.3 ... the bugs seemed to solved but the driver is considerably slower that our last "production ready 12.4.1" resulting in twice the duration for the database parts of our applications!
As these results in hours longer processings, we had to revert to 12.4.1. What can be the reasons nce of the 12.6 this extrem slower performance?
We run our test several times for each different driver version - just by replacing the ,jar and .dll (x64) file the duration spans are as follows:
V 12.4.1 : 00:42 - 00:50 V 12.4.2 : 00:49 - 00:54 V 12.5.0 : 01:35 - 01:47 V 12.6.0 : 01:40 - 01:47 V 12.6.3 : 01:45 - 02:02
As you can see, starting since driver V 12.5.0, the execution times more than doubled! (Windows 10, Java 17 LTS)