Open wengsy150943 opened 9 months ago
This bug still occurs in v7.5.0. And we find it will cause an unexpected connection close.
If we try to catch the expection in the case above and them rollback this transaction. The last rollback will fail because connection is closed.
We add a try-catch block in executeParamQueryAsync
in the raw case. Here is the new case.
package org.example;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class Main {
private static final String TIDB_HOST = "49.52.27.20";
private static final ExecutorService executor = Executors.newSingleThreadExecutor();
private static final int TIDB_PORT = 4000;
private static final long TIMEOUT = 1010;
public static void main(String[] args) throws ClassNotFoundException, SQLException, InterruptedException {
Connection connection1 = null;
Connection connection2 = null;
Class.forName("com.mysql.cj.jdbc.Driver");
try {
// JDBC connection parameters
String jdbcUrl = "jdbc:mysql://" + TIDB_HOST + ":" + TIDB_PORT + "/test";
String username = "root";
String password = "";
// Establish the connections
connection1 = DriverManager.getConnection(jdbcUrl, username, password);
connection1.setAutoCommit(false);
connection2 = DriverManager.getConnection(jdbcUrl, username, password);
connection2.setAutoCommit(false);
// Init database
executeQuery(connection1, "drop table if exists t1");
executeQuery(connection1, "create table t1 (a int, b int,primary key(a))");
executeQuery(connection1, "insert into t1 values (3, 2),(2, 3)");
executeQuery(connection1, "begin");
executeQuery(connection2, "begin");
// Parameterized delete operations
System.out.println("c1 delete: " + executeQuery(connection1, "delete from t1 where a = 3"));
System.out.println("c2 delete: " + executeQuery(connection2, "delete from t1 where a = 2"));
// conn1 delete the record async
CompletableFuture<String> res = null;
System.out.println("c1 delete: " + (res=executeParamQueryAsync(connection1, "delete from t1 where a = 2")));
// keep conn2 commit at the same time when conn1's delete is timeout
Thread.sleep(TIMEOUT);
System.out.println("c2 commit: " + executeQuery(connection2, "commit"));
res.get();
} catch (SQLException | InterruptedException | ExecutionException e) {
e.printStackTrace();
} finally {
Thread.sleep(2000);
// Close resources in the reverse order of their creation
connection1.close();
connection2.close();
executor.shutdown();
}
}
private static String executeQuery(Connection connection, String query) throws SQLException {
Statement statement = connection.createStatement();
statement.execute(query);
return "Success" ;
}
private static CompletableFuture<String> executeParamQueryAsync(Connection connection, String query) {
return CompletableFuture.supplyAsync(() -> {
try (PreparedStatement preparedStatement = connection.prepareStatement(query)) {
preparedStatement.setQueryTimeout(1);
return preparedStatement.executeUpdate() == 0 ? "Failure" : "Success";
} catch (SQLException e) {
e.printStackTrace();
try {
connection.commit();
} catch (Exception e1) { // HERE we catch the unexpected exception thrown by tidb, and rollback this transaction
e1.printStackTrace();
try {
connection.rollback();
System.out.println("successfully rollback");
} catch (SQLException ex) {
throw new RuntimeException(ex);
}
}
System.out.println("c1 delete end");
return "Failure";
}
}, executor);
}
}
The last rollback fails and throws a exception because connection is closed.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.sql.SQLNonTransientConnectionException: No operations allowed after connection closed.
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
at org.example.Main.main(Main.java:59)
Caused by: java.lang.RuntimeException: java.sql.SQLNonTransientConnectionException: No operations allowed after connection closed.
at org.example.Main.lambda$executeParamQueryAsync$0(Main.java:94)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.sql.SQLNonTransientConnectionException: No operations allowed after connection closed.
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:110)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:73)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:73)
at com.mysql.cj.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:1846)
at org.example.Main.lambda$executeParamQueryAsync$0(Main.java:91)
... 4 more
Since the case causes unexpected connection close, it may be more dangerous than a compatibility bug.
The TiDB version is:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version() |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v7.5.0
Edition: Community
Git Commit Hash: 069631e2ecfedc000ffb92c67207bea81380f020
Git Branch: heads/refs/tags/v7.5.0
UTC Build Time: 2023-11-24 08:50:14
GoVersion: go1.21.3
Race Enabled: false
Check Table Before Drop: false
Store: tikv |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Is there anything related/interesting in the TiDB logs? Would it be possible to upload a network traffic dump (pcap(ng) from tcpdump or dumpcap)? Does this still happen with MySQL Connector/J 8.3.0? (I would assume this is the case)
Thanks for your reply.
1)
We read the tidb.log
and these may help:
[2024/01/19 16:22:30.544 +08:00] [INFO] [ddl.go:1184] ["DDL job is finished"] [category=ddl] [jobID=23981]
[2024/01/19 16:22:30.545 +08:00] [INFO] [callback.go:139] ["performing DDL change, must reload"]
[2024/01/19 16:22:30.545 +08:00] [INFO] [split_region.go:85] ["split batch regions request"] ["split key count"=1] ["batch count"=1] ["first batch, region ID"=31] ["first split key"=748000000000005dac]
[2024/01/19 16:22:30.582 +08:00] [INFO] [split_region.go:187] ["batch split regions complete"] ["batch region ID"=31] ["first at"=748000000000005dac] ["first new region left"="{Id:33033 StartKey:748000000000005dffa900000000000000f8 EndKey:748000000000005dffac00000000000000f8 RegionEpoch:{ConfVer:5 Version:8038} Peers:[id:33034 store_id:3 id:33035 store_id:1 id:33036 store_id:2 ] EncryptionMeta:<nil> IsInFlashback:false FlashbackStartTs:0}"] ["new region count"=1]
[2024/01/19 16:22:30.582 +08:00] [INFO] [split_region.go:236] ["split regions complete"] ["region count"=1] ["region IDs"="[33033]"]
[2024/01/19 16:22:30.582 +08:00] [INFO] [region_request.go:1532] ["send request meet region error without retry"] [conn=1948260820] [session_alias=] [req-ts=447126582212165634] [req-type=Prewrite] [region="{ region id: 31, ver: 8037, confVer: 5 }"] [replica-read-type=leader] [stale-read=false] [request-sender="{rpcError:<nil>,replicaSelector: replicaSelector{selectorStateStr: accessKnownLeader, cacheRegionIsValid: false, replicaStatus: [peer: 32, store: 3, isEpochStale: false, attempts: 0, replica-epoch: 0, store-epoch: 0, store-state: resolved, store-liveness-state: reachable peer: 39, store: 1, isEpochStale: false, attempts: 0, replica-epoch: 0, store-epoch: 0, store-state: resolved, store-liveness-state: reachable peer: 47, store: 2, isEpochStale: false, attempts: 1, replica-epoch: 0, store-epoch: 0, store-state: resolved, store-liveness-state: reachable]}}"] [retry-times=0] [total-backoff-ms=0] [total-backoff-times=0] [max-exec-timeout-ms=20000] [total-region-errors=47-epoch_not_match:1]
[2024/01/19 16:22:30.584 +08:00] [INFO] [txn.go:55] ["Try to create a new txn inside a transaction auto commit"] [conn=1948260820] [session_alias=] [schemaVersion=33113] [txnStartTS=447126582212165634] [txnScope=global]
[2024/01/19 16:22:31.671 +08:00] [INFO] [server.go:860] [kill] [conn=1948260820] [query=true]
[2024/01/19 16:22:31.671 +08:00] [INFO] [conn.go:1032] ["read packet timeout, close this connection"] [conn=1948260820] [session_alias=] [idle=18.806102ms] [waitTimeout=28800] [error="read tcp 49.52.27.20:4000->172.30.242.53:10691: i/o timeout"]
[2024/01/19 16:22:31.674 +08:00] [ERROR] [txn.go:582] ["query interruppted"]
The last three lines show that connection is closed due to timeout. But we set a longer TIMEOUT at the end of the code and execute some operations after that. These operations works well. So it seems the connection timeout is not exceeded when this case happens.
Here is the code.
package org.example;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class Main {
private static final String TIDB_HOST = "49.52.27.20";
private static final ExecutorService executor = Executors.newSingleThreadExecutor();
private static final int TIDB_PORT = 4000;
private static final long TIMEOUT = 2000; // a longer timeout, which will not cause the bug
public static void main(String[] args) throws ClassNotFoundException, SQLException, InterruptedException {
Connection connection1 = null;
Connection connection2 = null;
Class.forName("com.mysql.cj.jdbc.Driver");
try {
// JDBC connection parameters
String jdbcUrl = "jdbc:mysql://" + TIDB_HOST + ":" + TIDB_PORT + "/test";
String username = "root";
String password = "";
// Establish the connections
connection1 = DriverManager.getConnection(jdbcUrl, username, password);
connection1.setAutoCommit(false);
connection2 = DriverManager.getConnection(jdbcUrl, username, password);
connection2.setAutoCommit(false);
// Init database
executeQuery(connection1, "drop table if exists t1");
executeQuery(connection1, "create table t1 (a int, b int,primary key(a))");
executeQuery(connection1, "insert into t1 values (3, 2),(2, 3)");
executeQuery(connection1, "begin");
executeQuery(connection2, "begin");
// Parameterized delete operations
System.out.println("c1 delete: " + executeQuery(connection1, "delete from t1 where a = 3"));
System.out.println("c2 delete: " + executeQuery(connection2, "delete from t1 where a = 2"));
// conn1 delete the record async
CompletableFuture<String> res = null;
System.out.println("c1 delete: " + (res=executeParamQueryAsync(connection1, "delete from t1 where a = 2")));
// keep conn2 commit at the same time when conn1's delete is timeout
Thread.sleep(TIMEOUT);
System.out.println("c2 commit: " + executeQuery(connection2, "commit"));
res.get();
} catch (SQLException | InterruptedException | ExecutionException e) {
e.printStackTrace();
} finally {
Thread.sleep(3000);
executeQuery(connection1, "select * from t1");// a select after sleep
// Close resources in the reverse order of their creation
connection1.close();
connection2.close();
executor.shutdown();
}
}
private static String executeQuery(Connection connection, String query) throws SQLException {
Statement statement = connection.createStatement();
statement.execute(query);
return "Success" ;
}
private static CompletableFuture<String> executeParamQueryAsync(Connection connection, String query) {
return CompletableFuture.supplyAsync(() -> {
try (PreparedStatement preparedStatement = connection.prepareStatement(query)) {
preparedStatement.setQueryTimeout(1);
return preparedStatement.executeUpdate() == 0 ? "Failure" : "Success";
} catch (SQLException e) {
e.printStackTrace();
try {
connection.commit();
} catch (Exception e1) {
e1.printStackTrace();
}
System.out.println("c1 delete end");
return "Failure";
}
}, executor);
}
}
2) I'm sorry but we cannot get a network traffic dump since there are lots of other network I/O in our mechine. It's hard to tell which is related to TiDB. We will try to get some lnetwork ogs from it. But it needs some time.
3) Unfortunately, it still happens with MySQL Connector/J 8.3.0. (we run the case with this new JDBC version and get the log shown above.)
update: update a better code case.
Would it be possible to upload a network traffic dump (pcap(ng) from tcpdump or dumpcap)?
We gain the network traffic dump of this case and find something interesting. The network messages of the latest operations are shown below(following the notation at the issue's beginning):
opid | conn1(9944) | conn1's res | conn2(9945) | conn2's res | conn3(9947) | conn3's res |
---|---|---|---|---|---|---|
op5 | delete from t1 where a = 2 | |||||
op5 | ACK | |||||
op6 | commit | |||||
op6 | create connection | |||||
op7 | ACK | |||||
op7 | ACK | |||||
op8 | KILL QUERY | |||||
op8 | ACK | |||||
op8 | FIN ACK | |||||
op9 | commit | |||||
op9 | RST |
In op5, conn1 starts a DELETE and receives its ack, while we know that op5 is blocked at that time. Then conn2 starts its COMMIT(op6). Meanwhile, JDBC starts a new connection(conn3) to handle the TIMEOUT of conn1's DELETE(op5). And then, conn3 sends a message "KILL QUERY"(op8) and gets its response. TiDB closes conn1 here, though it seems weird. So when conn1 attempts to COMMIT(or ROLLBACK) after that(op9), TiDB rejects this operation.
It seems that TiDB closes the connection unexpectedly after receiving the "KILL QUERY" command.
BTW, here is another question: did op5 finish when executing the "KILL QUERY" command? Since we cannot decode the ack message from TiDB, we cannot figure out this question. Therefore, we add the network dump to the attachment. The messages of this case are from No.100 to the end. And conn1's id is 9944, conn2's id is 9945, conn3's id is 9947, which can be found in the info of messages. Hope this information is helpful for you. (pcap file is not allowed in github, so we changed its file extension, while it is still a pcap file)
Bug Report
1. Minimal reproduce step (Required)
Running the case below by JAVA.
Schema:
Then run the case by two connections parallel.
We also provide our code which triggers this case. Note that the static value
TIMEOUT
maybe need to be changed based on the testing environment to keep conn2 commits at the same time when conn1's delete is timeout.And the jdbc version is:
2. What did you expect to see? (Required)
There is only one exception in the whole execution, i.e., the timeout exception of conn1's delete.
That's what MySQL8.0 does. Staggering the time of conn1's delete and conn2's commit also leads to this result, i.e., setting TIMEOUT to 2000.
Then the output is shown below.
3. What did you see instead (Required)
The commit/rollback after conn1's delete also fails and throws other exceptions(Communications link failure), as shown below.
This phenomenon is confusing because it's hard to check whether the transaction of conn1is finished.(In fact it is finished.)
And I think it is unexpected because:
4. What is your TiDB version? (Required)