microsoft / mssql-jdbc

The Microsoft JDBC Driver for SQL Server is a Type 4 JDBC driver that provides database connectivity with SQL Server through the standard JDBC application program interfaces (APIs).
MIT License
1.06k stars 424 forks source link

Performance difference in prepared statements compared to jTDS due to differing execution plan #1196

Closed Chrriis closed 2 years ago

Chrriis commented 4 years ago

Driver version

7.4.1.jre12

SQL Server version

Microsoft SQL Server 2017 (RTM-GDR) (KB4505224) - 14.0.2027.2 (X64) Jun 15 2019 00:26:19 Copyright (C) 2017 Microsoft Corporation Enterprise Edition (64-bit) on Windows 10 Enterprise 10.0 (Build 18362: )

Client Operating System

Windows 10 Pro (1909)

JAVA/JVM version

OpenJDK 12.0.2+10

Table schema

A simple table with a few columns and one index

Problem description

We would like to switch from JTDS to MS SQL JDBC driver. The problem is that during long data crunching, the MS SQL driver is much slower. Our data crunching, which includes various other things, takes 77 minutes with JTDS and 104 minutes with MS SQL.

  1. Expected behaviour: comparable or better performance than JTDS for similar statements.
  2. Actual behaviour: some statements in MS SQL JDBC driver are much slower. Our simple test case shows 3x-4x slower.
  3. Any other details that can be helpful: micro benchmarks do not work. It is the state of the database as it evolves during the data crunching that exhibits this behavior. Make any change to the database (drop a table, change an index, remove and re-add a constraint, etc.) and the test case disappears. It may have to do with the computed statistics that JTDS is able to use and not MS SQL. Of course, it may also be a misuse or misconfiguration of the MS SQL JDBC driver...

Reproduction code

I narrowed down the test case to a relatively small backup (14MB, 170MB unzipped) and a Java class.

MSvsJTDS_bak.zip SqlServerJdbcDriverSlowdownTest.zip

We got these results: 1st Try: Total time (ms): 2879 (MS) Total time (ms): 929 (JTDS)

2nd Try: Total time (ms): 2885 (MS) Total time (ms): 888 (JTDS)

3rd Try: Total time (ms): 2798 (MS) Total time (ms): 993 (JTDS)

4th Try: Total time (ms): 3057 (MS) Total time (ms): 947 (JTDS)

On a different client and host machine, we got: Total time (ms): 5681 (MS) Total time (ms): 1516 (JTDS)

ulvii commented 4 years ago

Hi @Chrriis, Thanks for creating the issue. We will investigate and get back to you.

ulvii commented 4 years ago

Hi @Chrriis, I wanted to share a few details in case they are helpful for your investigations. My initial testing shows that the behavior is query specific. I ran the exact same test with a simpler update query on numeric columns (update numerictable set column2=? where column1 = ?) and both drivers took almost exactly same amount of time to execute.

I also modified your test code a bit and seeing some strange behavior. When I hardcode the ID in the query

UPDATE T_DbPerfTest SET Col29 = x.Col29 FROM (   SELECT Col1, LEAD(Col28) OVER (PARTITION BY Col5, Col2, Col4, Col3 ORDER BY Col28, Col1) AS Col29   FROM T_DbPerfTest   WHERE Col29 IS NULL   AND (Col27 = 1 OR Col28 = ?)   AND Col2 = 240254 ) x WHERE T_DbPerfTest.Col1 = x.Col1 AND T_DbPerfTest.Col29 IS NULL AND x.Col29 = ? AND T_DbPerfTest.Col27 = 1

and update the JAVA code accordingly

    public static void executePreparedStatements(Connection connection, PreparedStatement preparedStatement) throws Exception {
        long[] ids = {
                240254,
                684796,
                684786,
                240102,
                240274,
                240303,
                240332,
                240244,
                685302,
                685331,
                685196,
                685389,
                685228,
                685360,
        };
        for(long id: ids) {
            preparedStatement.setLong(1, 27586214);
            preparedStatement.setLong(2, 27586214);
            preparedStatement.addBatch();
        }
        preparedStatement.executeBatch();
    }

jTDS driver becomes slower too, both drivers perform the same. So parametrizing IDs makes jTDS faster and I am currently not sure why. The only difference between drivers is jTDS calls sp_prepare whereas Microsoft JDBC driver calls sp_prepexec for PreparedStatemets. I will try to analyze this behavior sometime next week, please share your findings too.

Chrriis commented 4 years ago

Thank you very much for the update!

Maybe I should tell more about this schema. This schema contains a primary key (Col1), a conceptual key (Col2, Col3, Col4, Col5) with a versioning column (Col28). A version can contain many rows of different conceptual keys. Primary keys are (generally) an increasing number, and version column is an increasing number. Rows were added in the order of the versioning column, using statements that declare the keys in the same order. The purpose of the test statement is to set the value of Col29 to the value of Col28 of the next version (per its conceptual key).

I am not a specialist in statistics computing in databases, but it is possible that the layout after insertion helps running the update statement (only the most recent versions are of interest, so last inserted rows per conceptual keys / PK) as done by JTDS.

You say that the only difference is in using sp_prepexec instead of sp_prepare. But in the test case, we clearly prepare the statement before executing it multiple times. Do you mean that sp_prepexec is invoked at the first batch execution (and sp_execute after that) or used at each batch execution (what would it mean to have a prepare step everytime)? Have you tried modifying the MS code to use sp_prepare like JTDS to see if you get the same result than JTDS (so we are sure there is nothing else at play here, like a subtle difference in the use of the TDS protocol, data types, etc.)?

I will do a few experiments tomorrow, I will let you know if I find anything of interest.

Chrriis commented 4 years ago

I tried another variation of the statement which is less performant but more basic SQL (no window function). UPDATE T_DbPerfTest SET Col29 = ? WHERE Col2 = ? AND Col27 = 1 AND Col29 IS NULL AND EXISTS ( SELECT 1 FROM T_DbPerfTest b WHERE T_DbPerfTest.Col5 = b.Col5 AND T_DbPerfTest.Col2 = b.Col2 AND T_DbPerfTest.Col4 = b.Col4 AND T_DbPerfTest.Col3 = b.Col3 AND ( T_DbPerfTest.Col28 < b.Col28 OR T_DbPerfTest.Col28 = b.Col28 AND T_DbPerfTest.Col1 < b.Col1 )) You can try it yourself, just replace the statement in the test case, and comment the "setLong(3, xxx)" line. Instead of times like "6020 (MS) vs 1512 (JTDS)", I get "14013 (MS) vs 5667 (JTDS)". Indeed a slower statement, but still similar slowdown ratio between MS and JTDS.

Chrriis commented 4 years ago

I tried changing various settings in the connection string of SQL Server, but no improvement:

responseBuffering=full
enablePrepareOnFirstPreparedStatementCall=true
sendTimeAsDatetime=false
disableStatementPooling=false
statementPoolingCacheSize=10
serverPreparedStatementDiscardThreshold=0

I also tried to change JTDS parameters to see how the driver behaves. Still fast when changing:

TDS=7.0
maxStatements=0

JTDS became as slow as the MS driver when changing: prepareSQL=2

I was not surprised by that last change result. The JTDS documentation says:

2: sp_executesql is used (fast)
3: sp_prepare and sp_cursorprepare are used in conjunction with sp_execute and sp_cursorexecute (faster, SQL Server only)
Chrriis commented 4 years ago

The test case uses a batch but I found this is irrelevant. If you replace preparedStatement.addBatch(); with preparedStatement.execute(); and comment preparedStatement.executeBatch(); you get the same speed problem.

Chrriis commented 4 years ago

So parametrizing IDs makes jTDS faster and I am currently not sure why.

The funny thing is that if you hardcode the 1st and 3rd parameters in the query (only leaving the ID parameterized), then the query is still fast for JTDS and slow for MS.

ulvii commented 4 years ago

Hi @Chrriis ,

Do you mean that sp_prepexec is invoked at the first batch execution (and sp_execute after that) or used at each batch execution.

sp_prepexe is executed only once when statementPoolingCacheSize=10;disableStatementPooling=false;enablePrepareOnFirstPreparedStatementCall=true;. See SQL Profiler outputs below(I am hardcoding 1st and 3rd parameters):

MS JDBC

declare @p1 int
set @p1=1
exec sp_prepexec @p1 output,N'@P0 bigint',N'UPDATE T_DbPerfTest SET Col29 = x.Col29 FROM (   SELECT Col1, LEAD(Col28) OVER (PARTITION BY Col5, Col2, Col4, Col3 ORDER BY Col28, Col1) AS Col29   FROM T_DbPerfTest   WHERE Col29 IS NULL   AND (Col27 = 1 OR Col28 = 27586214)   AND Col2 = @P0 ) x WHERE T_DbPerfTest.Col1 = x.Col1 AND T_DbPerfTest.Col29 IS NULL AND x.Col29 = 27586214 AND T_DbPerfTest.Col27 = 1        ',240254
select @p1

exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360

jTDS

declare @p1 int
set @p1=1
exec sp_prepare @p1 output,N'@P0 bigint',N'UPDATE T_DbPerfTest SET Col29 = x.Col29 FROM (   SELECT Col1, LEAD(Col28) OVER (PARTITION BY Col5, Col2, Col4, Col3 ORDER BY Col28, Col1) AS Col29   FROM T_DbPerfTest   WHERE Col29 IS NULL   AND (Col27 = 1 OR Col28 = 27586214)   AND Col2 =  @P0  ) x WHERE T_DbPerfTest.Col1 = x.Col1 AND T_DbPerfTest.Col29 IS NULL AND x.Col29 = 27586214 AND T_DbPerfTest.Col27 = 1',1
select @p1

exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
exec sp_execute 1,240254
exec sp_execute 1,684796
exec sp_execute 1,684786
exec sp_execute 1,240102
exec sp_execute 1,240274
exec sp_execute 1,240303
exec sp_execute 1,240332
exec sp_execute 1,240244
exec sp_execute 1,685302
exec sp_execute 1,685331
exec sp_execute 1,685196
exec sp_execute 1,685389
exec sp_execute 1,685228
exec sp_execute 1,685360
Chrriis commented 4 years ago

@ulvii I looked at the code and figured that out :)

That being said, I disagree when you say that it is the only thing that differs. I simplified the test case and added logs of the bytes that are sent for a given execution of the statement. I see less bytes sent with JTDS although we should be under the same conditions in both cases: statement was prepared long time ago, there were the same number of executions, etc.

Of course it is possible that I made a mistake in my test case, so please do cross-check my findings!

The zip MSvsJTDS2.zip contains:

It seems that the JTDS prepared statement executions do not have the header (size 22). Maybe only the preparation packet needs it and adding it to executions creates problems?

Thanks a lot for your help!

Chrriis commented 4 years ago

I think I succeeded modifying the code so that the sp_prepexec call is just a sp_prepare instead. That way we can compare the same scenario between MS and JTDS. Well, unless I made a mistake, total time has not changed: MS is still 3x slower... So, if the above is correct (please confirm!), then it means the bad performance could be due to the way the connection is established (maybe a different setting?) or the version of the protocol. @ulvii What is your take?

cogman commented 4 years ago

I've done some profiling that may or may not be interesting for this case (I'm attaching it.)

Here's what I see.

It looks like the number of requests are roughly the same between the two drivers, yet the MS driver requests appear to take a much longer time.

The MS driver is spending all of it's time here

void SqlServerJdbcDriverSlowdownTest.main(String[]):52  1
void SqlServerJdbcDriverSlowdownTest.executePreparedStatements(Connection, PreparedStatement):102   1
int[] com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch():2076   1
void com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(TDSCommand):223   1
void com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(TDSCommand):248 1
boolean com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(TDSCommand):2979    1
boolean com.microsoft.sqlserver.jdbc.TDSCommand.execute(TDSWriter, TDSReader):7194  1
boolean com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtBatchExecCmd.doExecute():2675   1
void com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatementBatch(SQLServerPreparedStatement$PrepStmtBatchExecCmd):2786  1
TDSReader com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(boolean):7579   1
boolean com.microsoft.sqlserver.jdbc.TDSReader.readPacket():6418    1
int com.microsoft.sqlserver.jdbc.TDSChannel.read(byte[], int, int):2023 1
int java.net.Socket$SocketInputStream.read(byte[], int, int):68 1

On the flip side, the Jtds path looks as follows for where most of it's time is spent

void SqlServerJdbcDriverSlowdownTest.main(String[]):68  8
void SqlServerJdbcDriverSlowdownTest.executePreparedStatements(Connection, PreparedStatement):102   8
int[] net.sourceforge.jtds.jdbc.JtdsStatement.executeBatch():1051   8
SQLException net.sourceforge.jtds.jdbc.JtdsPreparedStatement.executeMSBatch(int, int, ArrayList):463    8
void net.sourceforge.jtds.jdbc.TdsCore.executeSQL(String, String, ParamInfo[], boolean, int, int, int, boolean):1086    8
void net.sourceforge.jtds.jdbc.TdsCore.wait(int):4127   8
int net.sourceforge.jtds.jdbc.ResponseStream.peek():99  8
int net.sourceforge.jtds.jdbc.ResponseStream.read():114 8
void net.sourceforge.jtds.jdbc.ResponseStream.getPacket():477   8
byte[] net.sourceforge.jtds.jdbc.SharedSocket.getNetPacket(SharedSocket$VirtualSocket, byte[]):731  8
byte[] net.sourceforge.jtds.jdbc.SharedSocket.readPacket(byte[]):850    8
void java.io.DataInputStream.readFully(byte[]):170  8
void java.io.DataInputStream.readFully(byte[], int, int):200    8
int java.net.Socket$SocketInputStream.read(byte[], int, int):68 8

This was using JTDs 1.3.1 and MS 7.4.1

I ran sql server locally in a docker container.

As far as I can tell, it looks like both are doing essentially the same thing. Some protocol or connection setting seems to be the likely culprit here, IMO.

Interestingly enough, in both profiling sessions I did (One with jtds first, one with it second, to make sure that DB caching wasn't messing with things), it looks like the first 2 batches for the MS driver ran faster than JTDs (~80ms). It was the subsequent requests that ended up taking ~1 second.

JTDS consistently spent around 160ms per request.

Is there some "expected batch size" or "look ahead" parameter going on that the MS driver isn't sending? sql_server_performance.zip

cogman commented 4 years ago

@Chrriis Given your request information and the TDS docs

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/b05b006b-3cbf-404b-bcaf-7ec584b54706

It looks to me (and I could be wrong) that the MS driver is setting the "SQLBatch" packet data section and JTDs is not.

Chrriis commented 4 years ago

Thanks @cogman. I analyzed the bytes that were sent vs TDS spec and suspected they were the MARS section stuff, so thank you for the confirmation. I think JTDS uses an older version of the protocol that does not have those.

About profiling, I too found that time was spent reading, which is why I started analyzing the sent messages in the first place, but they seemed to be the same.

So I think you are right saying there is a protocol or connection setting, but I fail to find which could be causing this huge difference. Any help is welcome there! :)

Chrriis commented 4 years ago

@ulvii I found the issue! The root cause is that the statement that is passed to the server is slightly different between MS and JTDS: the MS driver adds spaces at the end of the statement when replacing the ? parameters with @Px ones. As a quick test, you can simply add .trim() at the end of the SQLServerConnection.replaceParameterMarkers(xxx) method and MS becomes as fast as JTDS using my test cases.

Of course, the fix must be more subtle than this: the statement that is passed to the driver should be left untouched when replacing the parameters. What I mean is that

With that fix, the MS driver would do as is told (passing command as is) and for us it would become as fast as JTDS (at least for this case).

Please let us know what you think! :)

Chrriis commented 4 years ago

The fix in SQLServerConnection.replaceParameterMarkers(xxx) is simply to:

That way, the statement is untouched, spaces that were part of the original statement are preserved and no new spaces are added.

cogman commented 4 years ago

TBH, this method is a fair bit more complex than needs be. (Looks like a C->Java translation)

Make this into a string builder and use strings/Integer.toString and this will be both more idiomatic and readable. The performance hit will be negligible, potentially non-existent due to JVM optimizations.

For example, "makeParameterName" could be simplified to "@P" + nParam

Also, weird that spaces have that big of an impact on performance from the server. I'd never have guessed that.

Same method, more idiomatic

    static String replaceParamMarkers(String sqlSrc, Parameter[] params, boolean isReturnValueSyntax) {
        if (params == null || params.length == 0)
            return sqlSrc;
        char[] chars = sqlSrc.toCharArray();
        StringBuilder results = new StringBuilder(sqlSrc.length() + (6 * params.length));
        int param = 0;
        for (char c : chars) {
            if (c == '?') {
                results.append("@P").append(param);
                if (!isReturnValueSyntax && params[param].isOutput()) {
                    results.append(" OUT");
                }
                ++param;
            }
            else {
                results.append(c);
            }
        }

        return results.toString();
    }
ulvii commented 4 years ago

Hi @Chrriis , @cogman , I tried your suggestions, there is some performance improvement(2 seconds), but in my environment the performance of two drivers is still not comparable for this test case. Total time (ms): 7328 (MS) Total time (ms): 1105 (JTDS) I will continue testing.

Chrriis commented 4 years ago

Hi @ulvii,

I am not sure what you did, because on our systems, the problem and the fix are consistent.

Here are our results:

The only change between both is that:

        while (dstBegin < sqlDst.length) {
            sqlDst[dstBegin++] = ' ';
        }
        return new String(sqlDst);

is replaced with:

        return new String(sqlDst, 0, dstBegin);
ulvii commented 4 years ago

I downloaded your test case and updated the driver with your suggestion. I am still willing to create a PR if it resolves the issue for you. Would you implement @cogman's change and let me know if you are still seeing the performance improvement?

Chrriis commented 4 years ago

Thanks @ulvii, I would greatly appreciate it if you could updated the driver with the changes I suggested. It does make a huge difference!

Of course, what would be great for the future is if SQL Server itself were fixed to treat statements with and without trailing spaces in the same manner. If you know anyone at Microsoft who can do something in the server itself, you would make the world a better place :) Here are some posts which might be related: https://stackoverflow.com/questions/56334818/sql-server-performance-slow-when-query-text-has-a-particular-number-of-character https://stackoverflow.com/questions/12019589/last-4-whitespaces-of-sql-would-hurt-sql-server-performance

About the changes @cogman proposed, I would have to try them when time allows. That being said, I think the immediate need is a fix that is the least intrusive, and I think improving the code as suggested deserves its own bug report / PR to better evaluate if it is equivalent and does not breaks anything (for example, I am not sure how makeParamName(xx) currently works in detail).

@ulvii, @cogman, thanks a lot for your help!

cogman commented 4 years ago

@Chrriis I've ran the test a few times with a few permutations of the driver changes (mine, yours, part of yours, part of mine). I didn't see any driver differences even though the string output at the end was the same.

UPDATE T_DbPerfTest SET Col29 = x.Col29 FROM (   SELECT Col1, LEAD(Col28) OVER (PARTITION BY Col5, Col2, Col4, Col3 ORDER BY Col28, Col1) AS Col29   FROM T_DbPerfTest   WHERE Col29 IS NULL   AND (Col27 = 1 OR Col28 = @P0)   AND Col2 = @P1 ) x WHERE T_DbPerfTest.Col1 = x.Col1 AND T_DbPerfTest.Col29 IS NULL AND x.Col29 = @P2 AND T_DbPerfTest.Col27 = 1

I even tried running against sql server 2017 vs 2019 just to see if it was something about the database version. I threw in print statements into the driver to make sure I wasn't somehow getting an older version of the driver.

I ran this change with 8.1.1 built on my box.

Edit: Just for fun, I also added a bit of logic to strip out repeated whitespace in the query. No difference.

Testing Prepared Statements...
Total time (ms): 3828 (JTDS)
"UPDATE T_DbPerfTest SET Col29 = x.Col29 FROM ( SELECT Col1, LEAD(Col28) OVER (PARTITION BY Col5, Col2, Col4, Col3 ORDER BY Col28, Col1) AS Col29 FROM T_DbPerfTest WHERE Col29 IS NULL AND (Col27 = 1 OR Col28 = @P0) AND Col2 = @P1 ) x WHERE T_DbPerfTest.Col1 = x.Col1 AND T_DbPerfTest.Col29 IS NULL AND x.Col29 = @P2 AND T_DbPerfTest.Col27 = 1"
Total time (ms): 19405 (MS)
Chrriis commented 4 years ago

@ulvii I looked at @cogman's code. I changed a few things, e.g. to not allocate an array of chars just to get the length and to take into account the OUT parameter length. I truly think that rewriting this method should be a different issue/PR focused on improving the code and which could have its own dedicated thread of comments (let's not highjack this one which is about a bug in performance).

By-the-way, I did not have the bug today but I restored the database backup and could reproduce. I probably had background processes altering statistics. It is important to restore the backup before measuring again.

So, I would be grateful if you could fix the issue by replacing:

        while (dstBegin < sqlDst.length) {
            sqlDst[dstBegin++] = ' ';
        }
        return new String(sqlDst);

with:

        return new String(sqlDst, 0, dstBegin);

I hope this can make it soon in a release so we can continue our performance investigations between JTDS and MS drivers.

cogman commented 4 years ago

@Chrriis Did you miss where I ran your code and did not see performance improvements?

Specifically, the statement that got pushed out was UPDATE T_DbPerfTest SET Col29 = x.Col29 FROM ( SELECT Col1, LEAD(Col28) OVER (PARTITION BY Col5, Col2, Col4, Col3 ORDER BY Col28, Col1) AS Col29 FROM T_DbPerfTest WHERE Col29 IS NULL AND (Col27 = 1 OR Col28 = @P0) AND Col2 = @P1 ) x WHERE T_DbPerfTest.Col1 = x.Col1 AND T_DbPerfTest.Col29 IS NULL AND x.Col29 = @P2 AND T_DbPerfTest.Col27 = 1

I didn't just test my code change. Neither change on 8.1.1 resulted in a difference in performance.

I can cook up a test docker compose to demonstrate if you like.

Chrriis commented 4 years ago

@cogman This bug is related to SQL Server caches of query plans, parameter sniffing, statistics computing, etc. It is known that having trailing spaces, sometimes magical number of them, confuse SQL Server and make it not reuse an optimal plan. I spent too much time on this already. I gave a test case, I gave links to posts of some people with the same problem, I showed that the driver was adding useless spaces which can lead to this problem, and I gave metrics with and without these spaces.

Do you mean that you are not able to reproduce the issue? If so, this kind of issue is like concurrency issues: it is not because you do not see it that it does not exist. In any case, the statement does have useless spaces and it is known to cause problems.

Chrriis commented 4 years ago

@ulvii, @cogman, I have very interesting findings. I ported the whole thing to SQL Server Express 2012 (our earlier investigations were on SQL Server 2017). And these are the times we get (without fix): Total time (ms): 969 (JTDS) Total time (ms): 10252 (MS)

So, on that server, the performance penalty is 10 times!

But that is not all. If I apply the fix I suggested, the performance penalty remains. This might explain what @cogman is experiencing. So, I applied another difference I found between JTDS and MS JDBC: spaces around parameters. Depending on the test case, it does or does not help. Total time (ms): 969 (JTDS) Total time (ms): 985 (MS)

I was wondering why JTDS was adding these spaces, but maybe they (sometimes?) help parameter sniffing and query plan reuse in older versions of SQL Server, who knows. I would not urge to add the spaces around parameters like JTDS does though, this needs much more analysis.

So: the trailing spaces at the end of the statement are definitely a problem and this fix should be applied. But, removing them does not fix all of the performance penalties of MS driver compared to JTDS.

Here is the backup for SQL Express 2012: MSvsJTDS_SQLExpress2012.zip

Chrriis commented 4 years ago

I found how to fix the remaining performance penaly found on SQL Server 2012: I changed the MS driver code (temporary hack) to do a prepare instead of a prepExec. Then it becomes as fast...

So, we can infer that:

(of course, do check on your side if you come to the same conclusions).

ulvii commented 4 years ago

Hi @Chrriis , @cogman , Thanks for the investigations, interesting results. I created a #1215 for now, since we all agree that trailing spaces should be removed. I will get back to your investigations when time permits and perhaps get SQL Server team involved to figure out the root cause of the performance penalty.

Chrriis commented 4 years ago

Thanks @ulvii, I will try the next official release that contains this first fix. Unfortunately, we need to support SQL Server 2012 so we will stick to JTDS in production for the time being. That being said, if you need help with testing code changes or carry some analyses, do contact me.

About the PrepExec vs Prepare + Execute, JTDS has the notion of modes using the prepareSQL connection property. Here is its documentation:

prepareSQL (default - 3 for SQL Server, 1 for Sybase) This parameter specifies the mechanism used for Prepared Statements.

Value Description
0 SQL is sent to the server each time without any preparation, literals are inserted in the SQL (slower)
1 Temporary stored procedures are created for each unique SQL statement and parameter combination (faster)
2 sp_executesql is used (fast)
3 sp_prepare and sp_cursorprepare are used in conjunction with sp_execute and sp_cursorexecute (faster, SQL Server only)

While I don't think the MS driver should have these modes, maybe you could add some modes at least to switch between PrepExec and Prepare + Execute.

ulvii commented 4 years ago

Hi @Chrriis ,

I am investigating the performance issue against 2k12. Would you attach the code that uses sp_prepare instead of sp_prepexec in the driver?

Chrriis commented 4 years ago

Hi @ulvii ,

The code to use sp_prepare is a hack: so the first call will only do a prepare and no execution (but the test statement is a no-op, so it is not a problem) and I generally ad one more iteration in the tests to compensate. The change in code is to replace the content of the method SQLServerPreparedStatement.buildPrepExecParams(TDSWriter) with:

        if (getStatementLogger().isLoggable(java.util.logging.Level.FINE))
            getStatementLogger().fine(toString() + ": calling sp_prepexec: PreparedHandle:"
                    + getPreparedStatementHandle() + ", SQL:" + preparedSQL);

        expectPrepStmtHandle = true;
        executedSqlDirectly = false;
        expectCursorOutParams = false;
        outParamIndexAdjustment = 3;

        tdsWriter.writeShort((short) 0xFFFF); // procedure name length -> use ProcIDs
        tdsWriter.writeShort(TDS.PROCID_SP_PREPARE);
        tdsWriter.writeByte((byte) 0); // RPC procedure option 1
        tdsWriter.writeByte((byte) 0); // RPC procedure option 2
        tdsWriter.sendEnclavePackage(preparedSQL, enclaveCEKs);

        // <prepared handle>
        // IN (reprepare): Old handle to unprepare before repreparing
        // OUT: The newly prepared handle
        tdsWriter.writeRPCInt(null, getPreparedStatementHandle(), true);
        resetPrepStmtHandle(false);

        // <formal parameter defn> IN
        tdsWriter.writeRPCStringUnicode((preparedTypeDefinitions.length() > 0) ? preparedTypeDefinitions : null);

        // <stmt> IN
        tdsWriter.writeRPCStringUnicode(preparedSQL);
        tdsWriter.writeRPCInt(null, 1, false);
ulvii commented 4 years ago

Hi @Chrriis , I think I managed to make both drivers faster for your query, Microsoft JDBC driver being faster than jTDS most of the time against both SQL Server 2012 and 2017. Try creating the following index on your database:

USE [TC_PerfTest]
GO
CREATE NONCLUSTERED INDEX [PerfIndex]
ON [dbo].[T_DbPerfTest] ([Col2],[Col29])
INCLUDE ([Col1],[Col3],[Col4],[Col5],[Col27],[Col28])
GO

Below are the steps I took to find out how to improve performance:

  1. Copy the MS JDBC queries from the comment above and paste them into SSMS query editor (Make sure to remove set @p1=1).
  2. Enable SQL Server Execution Plan by clicking "Include Actual Execution Plan(CTRL+M)" on query menu and execute the queries.
  3. Once the execution plan is ready, right click and choose "Missing Index Details..." option. This will give the query above as a suggestion to improve performance.

Hope this helps.

Chrriis commented 4 years ago

Hi @ulvii,

Thanks for your investigations. Unfortunately, this does not help...

The examples I gave you are simplified examples. Our real application has the problem in 21 tables, where each contains millions of rows (gigabytes of data). We did spend a lot of time on index creation, and the actual tables happen to have some on the relevant columns.

If we were to add another index to work around the problem, then performance would not be as good because of the cost of maintaining this index. Moreover, adding an index would use more disk space and considering the amount of data in the tables it is not negligible. We also support multiple RDBMs and they mostly share the same schema and queries (with type mapping). For example, jTDS and Postgres use the same code here and Postgres happens to be slightly faster but not by much. Adding an index would decrease performance for everyone. Note that maybe other queries to other tables are affected too and we have yet to discover them. In any case, suggesting to rewrite our queries or change our schema to target the same database is hard to justify šŸ™‚

I don't see many options here:

Considering that a number of users will try migrating from jTDS to mssql-jdbc, wouldn't it be wise to have this prepare+exec mode for those who hit the same problem? How hard would it be to implement?

otbutz commented 4 years ago

Get Microsoft to add a mode in the mssql-jdbc driver so that we can switch to prepare+exec instead of prepexec

IMHO the only sensible approach.

Do we know if and which SQL Server versions apart from 2012 suffer from the prepexec performance penalty?

It's a shame that we still have to rely on an unmaintained third-party driver in order to achieve acceptable performance.

@ulvii one driver yields good performance, the other doesn't. So it's clearly not a problem caused by missing indices.

Chrriis commented 4 years ago

Using the SQL Express 2012 backup that I already shared: MSvsJTDS_SQLExpress2012.zip

And the 2 test cases that I think are the same versions that I already shared: 20200211 - SqlServerJdbcDriverSlowdownTests.zip

I get the following results (times in milliseconds, sample taken out of multiple runs):

Driver Expr. 2012 2014 2016 2017 2019
MS 13129 21483 5542 2391 10368
jTDS 1079 2069 1486 890 2421
Slowdown 12x 10x 4x 3x 4x
Driver Expr. 2012 2014 2016 2017 2019
MS 9860 15751 4196 2066 6828
jTDS 1016 1971 1419 817 2031
Slowdown 10x 8x 3x 3x 3x

Do not compare numbers between different versions because these were done on very different hardware (HDD vs SDD, slow/fast CPU/RAM, etc.).

otbutz commented 4 years ago

To sum it up:

Assuming that Microsofts parameter sniffing for prepexec execution plans gradually improved, the figures seem plausible to me (but still horrible).

I think that the difference in slowdown slope between test 1 and 2 is due to the lower number of parameters and is not really related to the use of batching?

Chrriis commented 4 years ago

I just updated the numbers above, and I correct my comment. I had a mistake in the way I was running Test2 which I resolved. Now results are consistent.

otbutz commented 4 years ago

Could you run the tests using your patched mssql driver which uses sp_prepare?

Chrriis commented 4 years ago

With the hack to do a prepare instead of a prepexec, here are the results for the same tests:

Driver Expr. 2012 2014 2016 2017 2019
MS 979 2073 1487 886 2209
jTDS 968 2205 1466 871 2210
Slowdown 1x 1x 1x 1x 1x
Driver Expr. 2012 2014 2016 2017 2019
MS 960 1930 1341 809 2092
jTDS 921 1955 1335 802 2050
Slowdown 1x 1x 1x 1x 1x
otbutz commented 4 years ago

These results are pretty clear. The question is how and when Microsoft will finally act.

jTDS is not JDBC4 compliant and will cause problems in the very near future. So people are forced to migrate to mssql-jdbc, as this is the only remaining driver without license fees which can be used to connect to a mssql database.

And this transition is already starting: e.g Atlassian Bamboo

You don't have to be a clairvoyant to predict that many companies could switch to other database products if the performance was severely affected by this forced migration.

ulvii commented 4 years ago

Hi @Chrriis , Thanks for the details. We are currently exploring our options and I started a discussion with SQL Server team to understand the performance difference between sp_prepare and sp_prepexec. I will keep you updated, please stay tuned.

ulvii commented 4 years ago

Hi @Chrriis ,

We looked into performance differences between sp_prepare and sp_prepexec with SQL Server team and the following is the conclusion we came to.

sp_prepexec is picking a parallel plan which is scanning the heap table and then joining with a different index. sp_prepare plan from jTDS is picking a serial plan and filtering on Col2 first. The plan difference is because in the sp_prepare case there is no value @P0 ā€“ it is basically NULL and based on index statistics on Col2 the server determines serial plan is best. However in the sp_prepexec case the @P0 has a value specified for the initial call which influences the plan. For this data set/indexes, you are getting better performance with the serial plan but there can be other cases where the parallel plan will perform better.

That being said, you can change your query to make sp_prepexec to use serial plan by appending OPTION (OPTIMIZE FOR UNKNOWN) to your query which will instruct the Query Optimizer to use statistical data instead of the initial values for all local variables when the query is compiled and optimized. Your query would become:

UPDATE T_DbPerfTest SET Col29 = x.Col29 FROM (   SELECT Col1, LEAD(Col28) OVER (PARTITION BY Col5, Col2, Col4, Col3 ORDER BY Col28, Col1) AS Col29   FROM T_DbPerfTest   WHERE Col29 IS NULL   AND (Col27 = 1 OR Col28 = ?)   AND Col2 = ? ) x WHERE T_DbPerfTest.Col1 = x.Col1 AND T_DbPerfTest.Col29 IS NULL AND x.Col29 = ? AND T_DbPerfTest.Col27 = 1 OPTION ( OPTIMIZE FOR UNKNOWN );
Chrriis commented 4 years ago

Hi @ulvii, thanks a lot for the explanation. I suspected that SQL Server was picking a plan based on the values passes as part of the first prepexec call.

Now, the big question is: is it how it should behave from the driver's point of view?

If I were to code directly the SQL Server calls (e.g. stored procedure):

I don't think that a statement created using the Java API of the driver should be created optimized for the first values that were passed in. Yes, I understand that it could be faster under some conditions, but I also acknowledge that it is not possible to know what future values will be passed in. It is by nature an "Optimize for unknown" situation. Thus, I think it is wrong for the driver to use sp_prepexec internally, unless you can execute it with "optimize for unknown" by default for all the calls.

As for adding "OPTION (OPTIMIZE FOR UNKNOWN)" to all our statements, people using an ORM cannot always do so. This can help some users but I feel it is a workaround.

Please share your thoughts, and of course do correct me if I am wrong!

otbutz commented 4 years ago

I have to agree with @Chrriis

Most users of the driver rely on ORM frameworks and can't use that workaround.

cogman commented 4 years ago

@Chrriis, the more I think about it, the more I agree with you. The semantics of jdbc with prepared statements is that there is an expectation that prepare and exec are ran at different times. This is particularly true when talking about using the jdbc batching capabilities.

I predict there will be fallout because, as you correctly point out, somethings will get slower. However, from a semantic purity point of view, seems like this is the right way to go.

This might also lead into how statement caching is working. The driver should, per connection, cache the results of sp_prepare.

sp_prepexec should be reserved for methods like Statement#executeQuery

ulvii commented 4 years ago

Now, the big question is: is it how it should behave from the driver's point of view?

Yes, I understand that it could be faster under some conditions, but I also acknowledge that it is not possible to know what future values will be passed in.

I wanted to clarify the conditions first. It is not about the different parameters being passed in, it is about how many values are queried from each parameter. Please note that, OPTION (OPTIMIZE FOR UNKNOWN) will perform better when the values queried in a table are not evenly distributed. For example, if one of your parameters updates 10 rows and another one updates 100000 rows. In the scenarios where the number of queried values are similar, you will want to create execution plans based on initial value. This is more of a SQL server tuning scenario, rather than JDBC driver issue and I don't agree that "OPTION (OPTIMIZE FOR UNKNOWN)" is usually a better option or is the option that the driver is supposed to use by default.

As for adding "OPTION (OPTIMIZE FOR UNKNOWN)" to all our statements, people using an ORM cannot always do so. This can help some users but I feel it is a workaround.

You can disable parameter sniffing database level and achieve similar performance results. Starting from SQL Server 2016 (13.x) SP1 you can use ALTER DATABASE SCOPED CONFIGURATION SET PARAMETER_SNIFFING = OFF ; command, for older servers the command is DBCC TRACEON(4136,-1).

Please let me know what you think.

Chrriis commented 4 years ago

OPTION (OPTIMIZE FOR UNKNOWN) will perform better when the values queried in a table are not evenly distributed.

Isn't data being evenly distributed the exception rather than the norm? Real life is more about chaos than order šŸ™‚

If I create a social application, some users will have a couple of friends and posts, some will have thousands. If I create an online shop, some sellers/buyers will have a couple of items in their inventory/purchase history, some will have thousands. If I create a financial application, some instruments will have few data points (e.g. monthly data), some will have millions (traded every second).

Performance is critical for the application where this problem was identified. There are:

Parameter sniffing is important for plain statements, so this feature should not be deactivated. Prepared statements are used when working on multiple data sets which have no guarantee of having data distributed evenly.

My take is that the driver optimizing prepared statements based on the first values is an arbitrary decision, which seemed like a good idea but should be reconsidered. Such optimization should be in the hands of the users of the driver, using hints or properties of the statement. We should not have to tweak our statements to opt out, or at least we should be able to opt out using a global driver setting (hence my original suggestion of having modes: prepare+exec vs prepexec).

otbutz commented 4 years ago

@ulvii is it really that hard to add an option to the driver and let the user switch the execution mode?

ulvii commented 4 years ago

Hi everyone, I discussed the issue with the team and we agree that providing an option to switch between sp_prepare and sp_prepexec has merit. We are backlogging the issue for now and will work on it when time permits. In the meantime, please make use of the suggestions above.

Chrriis commented 4 years ago

@ulvii Thanks a lot for the information. I think it is good news as it will allow more people to switch to your driver.

In the meantime, please make use of the suggestions above.

The suggestions are not practical for us, so we have no choice but to stick with jTDS until this is implemented...

dospencer commented 4 years ago

I have a related performance issue:

I have a java process that inserts 2 date fields (High_School_From and High_School_To into a table in batches of 10,000 using prepared statements and the executeBatch() routine which is taking about 3 minutes to complete the executeBatch(). I have used both the Microsoft JDBC driver 8.2 and 7.4 and get the same poor performance.

  1. When both fields are included in the insert, it takes about 3 minutes to commit the batch of 10,000
  2. When only High_School_From is included in the insert, it takes less than 1 minute to commit the batch of 10,000
  3. When only High_School_To is included in the insert, it takes about 2 minutes 41 seconds to commit the batch of 10,000

Of the 10,000 records, 9,130 (91 pct) of the High_School_From had null values and 5,893 (58 pct) had null values for the High_School_To column

I ran the same process against an on-prem MS SQL server database with both of the date columns and did not experience that performance issue. The executeBatch completed in less than 5 seconds. I do expect that the on-prem server would respond faster since there would be less network latency but the performance was significantly better. It took 5 seconds to commit the batch of 10,000 vs the 3 minutes. The @@VERSION from the on-prem database returns Microsoft SQL Server 2012 (SP1) .

I tried the same process with a MS SQL server hosted on a server we have with Azure hosting running 2016 and It also took 3 minutes to complete the executeBatch.

I switched from the Miocrosoft JDBC driver to an Open Source JTDS driver and the executeBatch completed in less than 30 seconds on all the databases mentioned; the Azure database, the on-prem SQL Server 2012 and the SQL Server 2016 running on the hosted Azure server.

I do not experience the performance issue on Azure when using the MS JDBC driver if the dates are not null and using the JDBC driver. My opinion is therefore that there is an issue with how the MS JDBC driver handles the null dates in the executeBatch.

I can't use the Open Source JTDS driver in production. Do you have any suggestion on what could be causing the issue?