pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
37.09k stars 5.83k forks source link

JDBC throws unexpected connection close when commit/rollback after timeout #49811

Open wengsy150943 opened 9 months ago

wengsy150943 commented 9 months ago

Bug Report

1. Minimal reproduce step (Required)

Running the case below by JAVA.

Schema:

create table t1 (a int, b int,primary key(a))
insert into t1 values (3, 2)

Then run the case by two connections parallel.

ID conn1 conn2
op1 begin
op2 begin
op3 delete from t1 where a = 3
op4 delete from t1 where a = 2
op5 delete from t1 where a = 2 -- set query timeout for it(op5)
op6 -- op5 is timeout and triggers rollback/commit commit

We also provide our code which triggers this case. Note that the static value TIMEOUT maybe need to be changed based on the testing environment to keep conn2 commits at the same time when conn1's delete is timeout.

package org.example;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class Main {

    private static final String TIDB_HOST = "49.52.27.20";

    private static final ExecutorService executor = Executors.newSingleThreadExecutor();
    private static final int TIDB_PORT = 4000;
    private static final long TIMEOUT = 1010;

    public static void main(String[] args) throws ClassNotFoundException, SQLException {
        Connection connection1 = null;
        Connection connection2 = null;

        Class.forName("com.mysql.cj.jdbc.Driver");

        try {
            // JDBC connection parameters
            String jdbcUrl = "jdbc:mysql://" + TIDB_HOST + ":" + TIDB_PORT + "/test";
            String username = "root";
            String password = "";

            // Establish the connections
            connection1 = DriverManager.getConnection(jdbcUrl, username, password);
            connection1.setAutoCommit(false);
            connection2 = DriverManager.getConnection(jdbcUrl, username, password);
            connection2.setAutoCommit(false);

            // Init database
            executeQuery(connection1, "drop table if exists t1");
            executeQuery(connection1, "create table t1 (a int, b int,primary key(a))");
            executeQuery(connection1, "insert into t1 values (3, 2),(2, 3)");

            executeQuery(connection1, "begin");
            executeQuery(connection2, "begin");

            // Parameterized delete operations
            System.out.println("c1 delete: " + executeQuery(connection1, "delete from t1 where a = 3"));
            System.out.println("c2 delete: " + executeQuery(connection2, "delete from t1 where a = 2"));

            // conn1 delete the record async
            CompletableFuture<String> res = null;
            System.out.println("c1 delete: " +  (res=executeParamQueryAsync(connection1, "delete from t1 where a = 2")));

            // keep conn2 commit at the same time when conn1's delete is timeout
            Thread.sleep(TIMEOUT);
            System.out.println("c2 commit: " + executeQuery(connection2, "commit"));
            res.get();

        } catch (SQLException | InterruptedException | ExecutionException e) {
            e.printStackTrace();
        } finally {
            // Close resources in the reverse order of their creation
            connection1.close();
            connection2.close();

            executor.shutdown();
        }
    }

    private static String executeQuery(Connection connection, String query) throws SQLException {
        Statement statement = connection.createStatement();
        statement.execute(query);
        return  "Success" ;
    }

    private static CompletableFuture<String> executeParamQueryAsync(Connection connection, String query) {
        return CompletableFuture.supplyAsync(() -> {
            try (PreparedStatement preparedStatement = connection.prepareStatement(query)) {
                preparedStatement.setQueryTimeout(1);
                return preparedStatement.executeUpdate() == 0 ? "Failure" : "Success";
            } catch (SQLException e) {
                e.printStackTrace();
                try {
                    connection.commit(); // HERE throws an unexpected exception, rollback causes a similar case
                } catch (Exception e1) {
                    e1.printStackTrace();
                }
                System.out.println("c1 delete end");
                return "Failure";
            }
        }, executor);
    }
}

And the jdbc version is:

<dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.27</version> <!-- Use the latest version available -->
</dependency>

2. What did you expect to see? (Required)

There is only one exception in the whole execution, i.e., the timeout exception of conn1's delete.

That's what MySQL8.0 does. Staggering the time of conn1's delete and conn2's commit also leads to this result, i.e., setting TIMEOUT to 2000.

Then the output is shown below.

c1 delete: Success
c2 delete: Success
c1 delete: java.util.concurrent.CompletableFuture@55141def[Not completed]
c2 commit: Success
com.mysql.cj.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
    at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:113)
    at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:953)
    at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1098)
    at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1046)
    at com.mysql.cj.jdbc.ClientPreparedStatement.executeLargeUpdate(ClientPreparedStatement.java:1371)
    at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdate(ClientPreparedStatement.java:1031)
    at org.example.Main.lambda$executeParamQueryAsync$0(Main.java:82)
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)
c1 delete end

3. What did you see instead (Required)

The commit/rollback after conn1's delete also fails and throws other exceptions(Communications link failure), as shown below.

c1 delete: Success
c2 delete: Success
c1 delete: java.util.concurrent.CompletableFuture@1786dec2[Not completed]
c2 commit: Success
com.mysql.cj.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
    at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:113)
    at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:953)
    at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1098)
    at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1046)
    at com.mysql.cj.jdbc.ClientPreparedStatement.executeLargeUpdate(ClientPreparedStatement.java:1371)
    at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdate(ClientPreparedStatement.java:1031)
    at org.example.Main.lambda$executeParamQueryAsync$0(Main.java:82)
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)
com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure

The last packet successfully received from the server was 18 milliseconds ago. The last packet sent successfully to the server was 1,041 milliseconds ago.
    at com.mysql.cj.jdbc.exceptions.SQLError.createCommunicationsException(SQLError.java:174)
    at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:64)
    at com.mysql.cj.jdbc.ConnectionImpl.commit(ConnectionImpl.java:806)
    at org.example.Main.lambda$executeParamQueryAsync$0(Main.java:86)
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure

The last packet successfully received from the server was 18 milliseconds ago. The last packet sent successfully to the server was 1,041 milliseconds ago.
    at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
    at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
    at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)
    at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)
    at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151)
    at com.mysql.cj.exceptions.ExceptionFactory.createCommunicationsException(ExceptionFactory.java:167)
    at com.mysql.cj.protocol.a.NativeProtocol.readMessage(NativeProtocol.java:520)
    at com.mysql.cj.protocol.a.NativeProtocol.checkErrorMessage(NativeProtocol.java:700)
    at com.mysql.cj.protocol.a.NativeProtocol.sendCommand(NativeProtocol.java:639)
    at com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(NativeProtocol.java:987)
    at com.mysql.cj.protocol.a.NativeProtocol.sendQueryString(NativeProtocol.java:933)
    at com.mysql.cj.NativeSession.execSQL(NativeSession.java:664)
    at com.mysql.cj.jdbc.ConnectionImpl.commit(ConnectionImpl.java:794)
    ... 5 more
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
    at com.mysql.cj.protocol.FullReadInputStream.readFully(FullReadInputStream.java:67)
    at com.mysql.cj.protocol.a.SimplePacketReader.readHeaderLocal(SimplePacketReader.java:81)
    at com.mysql.cj.protocol.a.SimplePacketReader.readHeader(SimplePacketReader.java:63)
    at com.mysql.cj.protocol.a.SimplePacketReader.readHeader(SimplePacketReader.java:45)
    at com.mysql.cj.protocol.a.TimeTrackingPacketReader.readHeader(TimeTrackingPacketReader.java:52)
    at com.mysql.cj.protocol.a.TimeTrackingPacketReader.readHeader(TimeTrackingPacketReader.java:41)
    at com.mysql.cj.protocol.a.MultiPacketReader.readHeader(MultiPacketReader.java:54)
    at com.mysql.cj.protocol.a.MultiPacketReader.readHeader(MultiPacketReader.java:44)
    at com.mysql.cj.protocol.a.NativeProtocol.readMessage(NativeProtocol.java:514)
    ... 11 more
c1 delete end

This phenomenon is confusing because it's hard to check whether the transaction of conn1is finished.(In fact it is finished.)

And I think it is unexpected because:

  1. It is incompatible with MySQL;
  2. It only occurs when the timeout of conn1's delete and conn2's commit happen at the same time.
  3. It seems the connection is closed after throwing these exceptions.

4. What is your TiDB version? (Required)

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version()                                                                                                                                                                                                                                                 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v7.4.0
Edition: Community
Git Commit Hash: 38cb4f3312be9199a983c0ef282d2ea2e28a7824
Git Branch: heads/refs/tags/v7.4.0
UTC Build Time: 2023-10-10 14:18:50
GoVersion: go1.21.1
Race Enabled: false
Check Table Before Drop: false
Store: tikv |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
wengsy150943 commented 9 months ago

This bug still occurs in v7.5.0. And we find it will cause an unexpected connection close.

If we try to catch the expection in the case above and them rollback this transaction. The last rollback will fail because connection is closed.

We add a try-catch block in executeParamQueryAsync in the raw case. Here is the new case.

package org.example;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class Main {

    private static final String TIDB_HOST = "49.52.27.20";

    private static final ExecutorService executor = Executors.newSingleThreadExecutor();
    private static final int TIDB_PORT = 4000;
    private static final long TIMEOUT = 1010;

    public static void main(String[] args) throws ClassNotFoundException, SQLException, InterruptedException {
        Connection connection1 = null;
        Connection connection2 = null;

        Class.forName("com.mysql.cj.jdbc.Driver");

        try {
            // JDBC connection parameters
            String jdbcUrl = "jdbc:mysql://" + TIDB_HOST + ":" + TIDB_PORT + "/test";
            String username = "root";
            String password = "";

            // Establish the connections
            connection1 = DriverManager.getConnection(jdbcUrl, username, password);
            connection1.setAutoCommit(false);
            connection2 = DriverManager.getConnection(jdbcUrl, username, password);
            connection2.setAutoCommit(false);

            // Init database
            executeQuery(connection1, "drop table if exists t1");
            executeQuery(connection1, "create table t1 (a int, b int,primary key(a))");
            executeQuery(connection1, "insert into t1 values (3, 2),(2, 3)");

            executeQuery(connection1, "begin");
            executeQuery(connection2, "begin");

            // Parameterized delete operations
            System.out.println("c1 delete: " + executeQuery(connection1, "delete from t1 where a = 3"));
            System.out.println("c2 delete: " + executeQuery(connection2, "delete from t1 where a = 2"));

            // conn1 delete the record async
            CompletableFuture<String> res = null;
            System.out.println("c1 delete: " +  (res=executeParamQueryAsync(connection1, "delete from t1 where a = 2")));

            // keep conn2 commit at the same time when conn1's delete is timeout
            Thread.sleep(TIMEOUT);
            System.out.println("c2 commit: " + executeQuery(connection2, "commit"));
            res.get();

        } catch (SQLException | InterruptedException | ExecutionException e) {
            e.printStackTrace();
        } finally {
            Thread.sleep(2000);
            // Close resources in the reverse order of their creation
            connection1.close();
            connection2.close();

            executor.shutdown();
        }
    }

    private static String executeQuery(Connection connection, String query) throws SQLException {
        Statement statement = connection.createStatement();
        statement.execute(query);
        return  "Success" ;
    }

    private static CompletableFuture<String> executeParamQueryAsync(Connection connection, String query) {
        return CompletableFuture.supplyAsync(() -> {
            try (PreparedStatement preparedStatement = connection.prepareStatement(query)) {
                preparedStatement.setQueryTimeout(1);
                return preparedStatement.executeUpdate() == 0 ? "Failure" : "Success";
            } catch (SQLException e) {
                e.printStackTrace();
                try {
                    connection.commit();
                } catch (Exception e1) { // HERE we catch the unexpected exception thrown by tidb, and rollback this transaction
                    e1.printStackTrace();
                    try {
                        connection.rollback();
                        System.out.println("successfully rollback");
                    } catch (SQLException ex) {
                        throw new RuntimeException(ex);
                    }
                }
                System.out.println("c1 delete end");
                return "Failure";
            }
        }, executor);
    }
}

The last rollback fails and throws a exception because connection is closed.

java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.sql.SQLNonTransientConnectionException: No operations allowed after connection closed.
    at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
    at org.example.Main.main(Main.java:59)
Caused by: java.lang.RuntimeException: java.sql.SQLNonTransientConnectionException: No operations allowed after connection closed.
    at org.example.Main.lambda$executeParamQueryAsync$0(Main.java:94)
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.sql.SQLNonTransientConnectionException: No operations allowed after connection closed.
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:110)
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89)
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63)
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:73)
    at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:73)
    at com.mysql.cj.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:1846)
    at org.example.Main.lambda$executeParamQueryAsync$0(Main.java:91)
    ... 4 more

Since the case causes unexpected connection close, it may be more dangerous than a compatibility bug.

The TiDB version is:

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version()                                                                                                                                                                                                                                                 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v7.5.0
Edition: Community
Git Commit Hash: 069631e2ecfedc000ffb92c67207bea81380f020
Git Branch: heads/refs/tags/v7.5.0
UTC Build Time: 2023-11-24 08:50:14
GoVersion: go1.21.3
Race Enabled: false
Check Table Before Drop: false
Store: tikv |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
dveeden commented 9 months ago

Is there anything related/interesting in the TiDB logs? Would it be possible to upload a network traffic dump (pcap(ng) from tcpdump or dumpcap)? Does this still happen with MySQL Connector/J 8.3.0? (I would assume this is the case)

wengsy150943 commented 9 months ago

Thanks for your reply.

1) We read the tidb.log and these may help:

[2024/01/19 16:22:30.544 +08:00] [INFO] [ddl.go:1184] ["DDL job is finished"] [category=ddl] [jobID=23981]
[2024/01/19 16:22:30.545 +08:00] [INFO] [callback.go:139] ["performing DDL change, must reload"]
[2024/01/19 16:22:30.545 +08:00] [INFO] [split_region.go:85] ["split batch regions request"] ["split key count"=1] ["batch count"=1] ["first batch, region ID"=31] ["first split key"=748000000000005dac]
[2024/01/19 16:22:30.582 +08:00] [INFO] [split_region.go:187] ["batch split regions complete"] ["batch region ID"=31] ["first at"=748000000000005dac] ["first new region left"="{Id:33033 StartKey:748000000000005dffa900000000000000f8 EndKey:748000000000005dffac00000000000000f8 RegionEpoch:{ConfVer:5 Version:8038} Peers:[id:33034 store_id:3  id:33035 store_id:1  id:33036 store_id:2 ] EncryptionMeta:<nil> IsInFlashback:false FlashbackStartTs:0}"] ["new region count"=1]
[2024/01/19 16:22:30.582 +08:00] [INFO] [split_region.go:236] ["split regions complete"] ["region count"=1] ["region IDs"="[33033]"]
[2024/01/19 16:22:30.582 +08:00] [INFO] [region_request.go:1532] ["send request meet region error without retry"] [conn=1948260820] [session_alias=] [req-ts=447126582212165634] [req-type=Prewrite] [region="{ region id: 31, ver: 8037, confVer: 5 }"] [replica-read-type=leader] [stale-read=false] [request-sender="{rpcError:<nil>,replicaSelector: replicaSelector{selectorStateStr: accessKnownLeader, cacheRegionIsValid: false, replicaStatus: [peer: 32, store: 3, isEpochStale: false, attempts: 0, replica-epoch: 0, store-epoch: 0, store-state: resolved, store-liveness-state: reachable peer: 39, store: 1, isEpochStale: false, attempts: 0, replica-epoch: 0, store-epoch: 0, store-state: resolved, store-liveness-state: reachable peer: 47, store: 2, isEpochStale: false, attempts: 1, replica-epoch: 0, store-epoch: 0, store-state: resolved, store-liveness-state: reachable]}}"] [retry-times=0] [total-backoff-ms=0] [total-backoff-times=0] [max-exec-timeout-ms=20000] [total-region-errors=47-epoch_not_match:1]
[2024/01/19 16:22:30.584 +08:00] [INFO] [txn.go:55] ["Try to create a new txn inside a transaction auto commit"] [conn=1948260820] [session_alias=] [schemaVersion=33113] [txnStartTS=447126582212165634] [txnScope=global]
[2024/01/19 16:22:31.671 +08:00] [INFO] [server.go:860] [kill] [conn=1948260820] [query=true]
[2024/01/19 16:22:31.671 +08:00] [INFO] [conn.go:1032] ["read packet timeout, close this connection"] [conn=1948260820] [session_alias=] [idle=18.806102ms] [waitTimeout=28800] [error="read tcp 49.52.27.20:4000->172.30.242.53:10691: i/o timeout"]
[2024/01/19 16:22:31.674 +08:00] [ERROR] [txn.go:582] ["query interruppted"]

The last three lines show that connection is closed due to timeout. But we set a longer TIMEOUT at the end of the code and execute some operations after that. These operations works well. So it seems the connection timeout is not exceeded when this case happens.

Here is the code.

package org.example;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class Main {

    private static final String TIDB_HOST = "49.52.27.20";

    private static final ExecutorService executor = Executors.newSingleThreadExecutor();
    private static final int TIDB_PORT = 4000;
    private static final long TIMEOUT = 2000; // a longer timeout, which will not cause the bug

    public static void main(String[] args) throws ClassNotFoundException, SQLException, InterruptedException {
        Connection connection1 = null;
        Connection connection2 = null;

        Class.forName("com.mysql.cj.jdbc.Driver");

        try {
            // JDBC connection parameters
            String jdbcUrl = "jdbc:mysql://" + TIDB_HOST + ":" + TIDB_PORT + "/test";
            String username = "root";
            String password = "";

            // Establish the connections
            connection1 = DriverManager.getConnection(jdbcUrl, username, password);
            connection1.setAutoCommit(false);
            connection2 = DriverManager.getConnection(jdbcUrl, username, password);
            connection2.setAutoCommit(false);

            // Init database
            executeQuery(connection1, "drop table if exists t1");
            executeQuery(connection1, "create table t1 (a int, b int,primary key(a))");
            executeQuery(connection1, "insert into t1 values (3, 2),(2, 3)");

            executeQuery(connection1, "begin");
            executeQuery(connection2, "begin");

            // Parameterized delete operations
            System.out.println("c1 delete: " + executeQuery(connection1, "delete from t1 where a = 3"));
            System.out.println("c2 delete: " + executeQuery(connection2, "delete from t1 where a = 2"));

            // conn1 delete the record async
            CompletableFuture<String> res = null;
            System.out.println("c1 delete: " +  (res=executeParamQueryAsync(connection1, "delete from t1 where a = 2")));

            // keep conn2 commit at the same time when conn1's delete is timeout
            Thread.sleep(TIMEOUT);
            System.out.println("c2 commit: " + executeQuery(connection2, "commit"));
            res.get();

        } catch (SQLException | InterruptedException | ExecutionException e) {
            e.printStackTrace();
        } finally {
            Thread.sleep(3000);
            executeQuery(connection1, "select * from t1");// a select after sleep
            // Close resources in the reverse order of their creation
            connection1.close();
            connection2.close();

            executor.shutdown();
        }
    }

    private static String executeQuery(Connection connection, String query) throws SQLException {
        Statement statement = connection.createStatement();
        statement.execute(query);
        return  "Success" ;
    }

    private static CompletableFuture<String> executeParamQueryAsync(Connection connection, String query) {
        return CompletableFuture.supplyAsync(() -> {
            try (PreparedStatement preparedStatement = connection.prepareStatement(query)) {
                preparedStatement.setQueryTimeout(1);
                return preparedStatement.executeUpdate() == 0 ? "Failure" : "Success";
            } catch (SQLException e) {
                e.printStackTrace();
                try {
                    connection.commit();
                } catch (Exception e1) { 
                    e1.printStackTrace();
                }
                System.out.println("c1 delete end");
                return "Failure";
            }
        }, executor);
    }
}

2) I'm sorry but we cannot get a network traffic dump since there are lots of other network I/O in our mechine. It's hard to tell which is related to TiDB. We will try to get some lnetwork ogs from it. But it needs some time.

3) Unfortunately, it still happens with MySQL Connector/J 8.3.0. (we run the case with this new JDBC version and get the log shown above.)


update: update a better code case.

wengsy150943 commented 8 months ago

Would it be possible to upload a network traffic dump (pcap(ng) from tcpdump or dumpcap)?

We gain the network traffic dump of this case and find something interesting. The network messages of the latest operations are shown below(following the notation at the issue's beginning):

opid conn1(9944) conn1's res conn2(9945) conn2's res conn3(9947) conn3's res
op5 delete from t1 where a = 2
op5 ACK
op6 commit
op6 create connection
op7 ACK
op7 ACK
op8 KILL QUERY
op8 ACK
op8 FIN ACK
op9 commit
op9 RST

In op5, conn1 starts a DELETE and receives its ack, while we know that op5 is blocked at that time. Then conn2 starts its COMMIT(op6). Meanwhile, JDBC starts a new connection(conn3) to handle the TIMEOUT of conn1's DELETE(op5). And then, conn3 sends a message "KILL QUERY"(op8) and gets its response. TiDB closes conn1 here, though it seems weird. So when conn1 attempts to COMMIT(or ROLLBACK) after that(op9), TiDB rejects this operation.

It seems that TiDB closes the connection unexpectedly after receiving the "KILL QUERY" command.

BTW, here is another question: did op5 finish when executing the "KILL QUERY" command? Since we cannot decode the ack message from TiDB, we cannot figure out this question. Therefore, we add the network dump to the attachment. The messages of this case are from No.100 to the end. And conn1's id is 9944, conn2's id is 9945, conn3's id is 9947, which can be found in the info of messages. Hope this information is helpful for you. (pcap file is not allowed in github, so we changed its file extension, while it is still a pcap file)

tidb.txt