microsoft / mssql-jdbc

The Microsoft JDBC Driver for SQL Server is a Type 4 JDBC driver that provides database connectivity with SQL Server through the standard JDBC application program interfaces (APIs).
MIT License
1.05k stars 424 forks source link

Infinite loop in SQLServerConnection.login, spawning thousands of threads, causes high CPU and threads consumption #1831

Closed haimadrian closed 1 year ago

haimadrian commented 2 years ago

Hello, I'm having an issue with mssql-jdbc driver in multi-threaded environment.

Driver version

10.2.0.jre8

SQL Server version

Microsoft SQL Server 2019 (RTM-GDR) (KB4583458) - 15.0.2080.9 (X64) Nov 6 2020 16:50:01 Copyright (C) 2019 Microsoft Corporation Developer Edition (64-bit) on Windows 10 Enterprise 10.0 (Build 19044: )

Client Operating System

Windows 10 Enterprise 21H2 19044.1706 Processor: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz 2.59 GHz (6 cores, 12 logical) RAM: 32GB

JAVA/JVM version

openjdk version "1.8.0_222" OpenJDK Runtime Environment (Zulu 8.40.0.25-CA-win64) (build 1.8.0_222-b10) OpenJDK 64-Bit Server VM (Zulu 8.40.0.25-CA-win64) (build 25.222-b10, mixed mode)

Table schema

N/A

Problem description

The issue is that the thread pool of com.microsoft.sqlserver.jdbc.SocketFinder is not limited, and it creates thousands of threads, causing the JVM to consume all available CPU of a machine. The issue is reproducible when two threads are trying to connect to a stopped instance, and the first thread receives a TimeOut exception, and interrupt the second thread while it is waiting for its time-out to occur. I have prepared a short demonstration to reproduce the issue. You do not really have to connect to a real instance. I could overcome this by synchronizing the access to DriverManager.getConnection(...), and some sleep (250 milliseconds). Without the sleep the issue is still reproducible. I would like to avoid synchronizing the access to DriverManager, to let multiple threads work concurrently. The issue is also reproducible when using commons-dbcp, and this time synchronizing the access to basicDataSource.getConnection() does not help to overcome the issue.

Expected behavior

Actual behavior

7K threads and ~80% CPU usage. Output from my demo: (Notice the ~7K threads)

2022-05-11 19:00:55.528 {main}: Connecting to SQL Server...
2022-05-11 19:00:55.530 {ThreadMXBean}: Amount of threads: 8
2022-05-11 19:00:55.531 {DemoThread}: Going to sleep for 5 SECONDS
2022-05-11 19:00:57.528 {ThreadMXBean}: Amount of threads: 10
2022-05-11 19:00:59.524 {ThreadMXBean}: Amount of threads: 10
2022-05-11 19:01:00.546 {DemoThread}: Connecting to SQL Server...
2022-05-11 19:01:01.523 {ThreadMXBean}: Amount of threads: 12
2022-05-11 19:01:03.530 {ThreadMXBean}: Amount of threads: 12
2022-05-11 19:01:05.527 {ThreadMXBean}: Amount of threads: 12
2022-05-11 19:01:05.703 {main}: Error has occurred: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host localhost, port 1433 has failed. Error: "connect timed out. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host localhost, port 1433 has failed. Error: "connect timed out. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
    at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:237)
    at com.microsoft.sqlserver.jdbc.SQLServerException.ConvertConnectExceptionToSQLServerException(SQLServerException.java:288)
    at com.microsoft.sqlserver.jdbc.SocketFinder.findSocket(IOBuffer.java:2720)
    at com.microsoft.sqlserver.jdbc.TDSChannel.open(IOBuffer.java:761)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:3180)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:2833)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:2671)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1640)
    at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:936)
    at java.sql.DriverManager.getConnection(DriverManager.java:664)
    at java.sql.DriverManager.getConnection(DriverManager.java:208)
    at org.example.MSSQLJdbcTest$SqlServerConnection.connect(MSSQLJdbcTest.java:145)
    at org.example.MSSQLJdbcTest.main(MSSQLJdbcTest.java:57)

2022-05-11 19:01:05.706 {main}: Going to sleep for 10 SECONDS
2022-05-11 19:01:07.529 {ThreadMXBean}: Amount of threads: 4339
2022-05-11 19:01:09.526 {ThreadMXBean}: Amount of threads: 7531
2022-05-11 19:01:09.547 {DemoThread}: Error has occurred during second connect: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host localhost, port 1433 has failed. Error: "null. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host localhost, port 1433 has failed. Error: "null. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
    at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:237)
    at com.microsoft.sqlserver.jdbc.SQLServerException.ConvertConnectExceptionToSQLServerException(SQLServerException.java:288)
    at com.microsoft.sqlserver.jdbc.SocketFinder.findSocket(IOBuffer.java:2708)
    at com.microsoft.sqlserver.jdbc.TDSChannel.open(IOBuffer.java:761)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:3180)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:2833)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:2671)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1640)
    at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:936)
    at java.sql.DriverManager.getConnection(DriverManager.java:664)
    at java.sql.DriverManager.getConnection(DriverManager.java:208)
    at org.example.MSSQLJdbcTest$SqlServerConnection.connect(MSSQLJdbcTest.java:145)
    at org.example.MSSQLJdbcTest.lambda$main$3(MSSQLJdbcTest.java:50)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

2022-05-11 19:01:11.533 {ThreadMXBean}: Amount of threads: 7531
2022-05-11 19:01:13.529 {ThreadMXBean}: Amount of threads: 7531
2022-05-11 19:01:15.678 {ThreadMXBean}: Amount of threads: 7531
2022-05-11 19:01:15.762 {main}: Going to sleep for 10 SECONDS
2022-05-11 19:01:17.522 {ThreadMXBean}: Amount of threads: 7531
2022-05-11 19:01:19.527 {ThreadMXBean}: Amount of threads: 7531
2022-05-11 19:01:21.526 {ThreadMXBean}: Amount of threads: 7531
2022-05-11 19:01:23.521 {ThreadMXBean}: Amount of threads: 7531
2022-05-11 19:01:25.530 {ThreadMXBean}: Amount of threads: 7531
2022-05-11 19:01:25.763 {main}: All Tests Complete. Exiting.

Error message/stack trace

N/A

Any other details that can be helpful

You can run this code to reproduce the issue.

package org.example;

import java.io.PrintWriter;
import java.io.StringWriter;
import java.lang.management.ManagementFactory;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Properties;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;

public class MSSQLJdbcTest {
    private static final SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");

    public static void main(String[] args) {
        // Make sure the instance is stopped
        String connectionUrl = "jdbc:sqlserver://localhost" +
                ";databaseName=master" +
                ";applicationName=Naruto" +
                ";loginTimeout=10" +
                ";socketTimeout=10000" +
                ";encrypt=false" +
                ";trustServerCertificate=true" +
                //";integratedSecurity=true" +
                //";authenticationScheme=NativeAuthentication" +
                ";MultiSubnetFailover=false" +
                ";userName=Sasuke"
                ;

        SqlServerConnection connection = new SqlServerConnection(connectionUrl, "Uchiha");
        ExecutorService executor = null;
        ScheduledExecutorService scheduledExecutor = null;

        try {
            // A scheduler to print amount of threads created in this jvm
            scheduledExecutor = Executors.newSingleThreadScheduledExecutor(r -> new Thread(r, "ThreadMXBean"));
            scheduledExecutor.scheduleAtFixedRate(() -> log("Amount of threads: %d", ManagementFactory.getThreadMXBean().getTotalStartedThreadCount()), 0, 2, TimeUnit.SECONDS);

            // A second thread for the demo, to simulate multi-threaded environment
            executor = Executors.newSingleThreadExecutor(r -> new Thread(r, "DemoThread"));
            executor.submit(() -> {
                safeSleep(5, TimeUnit.SECONDS);
                try {
                    // Second connect
                    connection.connect();
                } catch (SQLException e) {
                    log("Error has occurred during second connect:", e);
                }
            });

            // First connect
            connection.connect();

            safeSleep(5, TimeUnit.SECONDS);
        } catch (Exception e) {
            log("Error has occurred:", e);
        } finally {
            if (executor != null) {
                // Shutdown executor and interrupt the thread.
                executor.shutdownNow();
            }

            try {
                safeSleep(10, TimeUnit.SECONDS);
                connection.disconnect();
            } catch (SQLException e) {
                log("Error has occurred while closing connection:", e);
            }

            safeSleep(10, TimeUnit.SECONDS);

            if (scheduledExecutor != null) {
                scheduledExecutor.shutdownNow();
            }
        }

        log("All Tests Complete. Exiting.");
    }

    private static void safeSleep(int duration, TimeUnit timeUnit) {
        log("Going to sleep for %d %s", duration, timeUnit.name());
        try { Thread.sleep(timeUnit.toMillis(duration)); } catch (InterruptedException ignore) {}
    }

    private static void log(String message, Object... args) {
        log(message, null, args);
    }

    private synchronized static void log(String message, Throwable thrown, Object... args) {
        String msg = message;

        if (args != null) {
            msg = String.format(msg, args);
        }

        if (thrown != null) {
            msg += " " + thrown;
        }

        System.out.printf("%s {%s}: %s%n", dateFormat.format(new Date()), Thread.currentThread().getName(), msg);

        if (thrown != null) {
            System.out.println(getStackTrace(thrown));
        }
    }

    private static String getStackTrace(Throwable thrown) {
        String result = null;
        try (StringWriter sw = new StringWriter(); PrintWriter pw = new PrintWriter(sw)) {
            thrown.printStackTrace(pw);
            result = sw.toString();
        } catch (Exception ignore) { }

        return result;
    }

    private static class SqlServerConnection {
        private final String connectionUrl;
        private final String pwd;
        private Connection connection;

        SqlServerConnection(String connectionUrl, String pwd) {
            this.connectionUrl = connectionUrl;
            this.pwd = pwd;
        }

        public void connect() throws SQLException {
            log("Connecting to SQL Server...");
            if (connection != null) {
                log("Connection is already existing. Disconnecting existing connection...");
                disconnect();
            }
            Properties props = new Properties();
            props.put("password", pwd);

            // Using this, along with synchronization over 'this' instance can overcome the issue,
            // but it will not allow two threads to connect simultaneously...
            //safeSleep(250, TimeUnit.MILLISECONDS);

            connection = DriverManager.getConnection(connectionUrl, props);
            log("Connected.");
        }

        public void disconnect() throws SQLException {
            if (connection != null) {
                log("Disconnecting from SQL Server...");
                connection.close();
                log("Disconnected.");
            }
        }
    }
}

JDBC trace logs

Modifying log level affects the issue:

MSSQLJdbcDemo.log

Jeffery-Wasty commented 2 years ago

Hi @haimadrian

Thank you for including both the stack trace as well as sample code. This will greatly assist us in resolving the issue. We will be looking into it and get back to you as soon as we can with either a solution, or a request for more information.

Jeffery-Wasty commented 2 years ago

The situation you are describing sounds similar to what other users have encountered. Are you able to try other versions of the driver, specifically the latest 11.1.1 version? Based on the results, this will help narrow down next steps.

haimadrian commented 2 years ago

Hi @Jeffery-Wasty and thanks for your reply. I've just double checked it, using latest 11.1.1.0-preview version, and the issue is still reproducible.

Attached JDBC trace logs: MSSQLJdbcDemo.log

Something interesting I noticed is that when the issue happens, the first SQLServerException's message describes that the error is: "connect timed out", which is the expected behavior. Though the second SQLServerException, caught by the second thread, describes that the error is: "null". If I synchronize the code calling DriverManager.getConnection() then both threads receive the exact same exception with reason: "connect timed out" (the expected behavior)

Jeffery-Wasty commented 2 years ago

Hi @haimadrian,

An update on this issue: we're still working towards a resolution. Part of the problem being currently faced is that we're having trouble reproducing the issue. You did include sample code up above that should be able to work without connecting to an actual instance, but still we're not able to reproduce the issue seen even with this code. Can I ask you to confirm that the above demonstration truly does recreate the issue described, after a set amount of time. Thanks.

haimadrian commented 2 years ago

Hi @Jeffery-Wasty I've just played with it a little bit and I see that the issue happens when trying to connect to localhost, on Windows. I tried to set up a SQL Server on Azure, to share it with you, but the issue was not reproduced.

It seems that when SocketFinder tries to find socket, and it uses InetAddress.getAllByName(hostName), where hostName="localhost", it then receives two results, and not one. This then yields to multiple executions later, in findSocketUsingThreading method, which creates two threads (for the two addresses) and not one. And when this scenario happens in a loop, there are thousands of executions.

The results from InetAddress:

  1. localhost/127.0.0.1
  2. localhost/0:0:0:0:0:0:0:1

Here I recorded my screen while reproducing the issue: https://user-images.githubusercontent.com/20064967/168935141-2344004a-48ed-4190-ad6e-fc7ca43f3ece.mp4 In the demo there is a short time-out, so the issue is resolved quickly. But in production the time-out is set to 1-1.5 minutes, so it is more problematic there.

Folder containing files from the demo: (mssql-jdbc-test-1.0 and mssql-jdbc-11.1.1.jre8-preview) MSSQLIssue.zip

Sources of mssql-jdbc-test-1.0: mssql-jdbc-test-1.0-sources.zip

Jeffery-Wasty commented 2 years ago

Hi @haimadrian,

An update on this issue. We're able to spend more time on this issue now, and hope to deliver a solution soon. In the meantime, please confirm that the problem still persists with the latest driver version (11.2.0).

haimadrian commented 2 years ago

Hi @Jeffery-Wasty Issue is still reproducible.

image

MSSQLJdbcDemo.log

dl6lr commented 1 year ago

@haimadrian A note to the demo program: The ThreadMXBean thread should emit ManagementFactory.getThreadMXBean().getThreadCount() for the current number of running threads. The started threads all stops within a short time frame, but the huge number of threads consume a lot of OS resources.

@Jeffery-Wasty This issue has a big impact, as it consumes a lot of resources in Windows. We experience running software to consume 2GB and more just for the threads, where the JVM runs the application in 256MB heap. Furthermore it affects not only the running application in the JVM, but disturbs all kind of other network activity with IP connections of other applications running out of resources, as the dynamic ports for outbound sockets are exhausted. The destination is SQL Server 2019 HA cluster with MultiSubnetFailover, so there are for sure more IP adresses for the DB and the issue occurs very often.

With a 64 bit JVM, the application consumes a huge amount of CPU and network resources for about 15 seconds. With a 32 bit JVM, the application runs out of resources with the following stack trace:

java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached at java.lang.Thread.start0(Native Method) ~[?:?] at java.lang.Thread.start(Unknown Source) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.addWorker(Unknown Source) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) ~[?:?] at com.microsoft.sqlserver.jdbc.SocketFinder.findSocketUsingThreading(IOBuffer.java:2721) ~[mssql-jdbc-9.4.0.jre11.jar:?] at com.microsoft.sqlserver.jdbc.SocketFinder.findSocket(IOBuffer.java:2415) ~[mssql-jdbc-9.4.0.jre11.jar:?] at com.microsoft.sqlserver.jdbc.TDSChannel.open(IOBuffer.java:676) ~[mssql-jdbc-9.4.0.jre11.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2957) ~[mssql-jdbc-9.4.0.jre11.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:2628) ~[mssql-jdbc-9.4.0.jre11.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:2471) ~[mssql-jdbc-9.4.0.jre11.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1470) ~[mssql-jdbc-9.4.0.jre11.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:915) ~[mssql-jdbc-9.4.0.jre11.jar:?] at java.sql.DriverManager.getConnection(Unknown Source) ~[java.sql:?] at java.sql.DriverManager.getConnection(Unknown Source) ~[java.sql:?]

The problem still exists in latest driver version 12.2.0

I suspended the application in the debugger after the first connection timeout occured, and all the threads in the threadpool of SocketFinder where in the following state:

Thread [pool-1-thread-1] (Suspended) owns: SocksSocketImpl (id=564) PlainSocketImpl.waitForConnect(int, int) line: not available [native method] SocksSocketImpl(PlainSocketImpl).socketConnect(InetAddress, int, int) line: 107 SocksSocketImpl(AbstractPlainSocketImpl).doConnect(InetAddress, int, int) line: 399 SocksSocketImpl(AbstractPlainSocketImpl).connectToAddress(InetAddress, int, int) line: 242 SocksSocketImpl(AbstractPlainSocketImpl).connect(SocketAddress, int) line: 224 SocksSocketImpl.connect(SocketAddress, int) line: 392 Socket.connect(SocketAddress, int) line: 609 SocketConnector.run() line: 3240 ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1128 ThreadPoolExecutor$Worker.run() line: 628 Thread.run() line: 829

Find a thread dump attached. threaddump-1677071134843.zip

Jeffery-Wasty commented 1 year ago

Hi @dl6lr,

Thank you for the update on this issue. We're revisiting it currently, so the additional information is very helpful. Can you let us know which versions you are using for SQL Server, Client OS, and Java version? Is it the same as the original user's?

dl6lr commented 1 year ago

Hi @Jeffery-Wasty ,

thank you for your response. Here some Informations:

SQL-Server:

Clients (about 60 systems):

The clients are OLTP systems. If you need further informations, I will try to provide them.