A # of clients being routed through ProxySQL reported seeing "Lost connection to MySQL server during query" errors periodically. These client connections are connecting to ProxySQL using both TLS and clear-text. Connections from ProxySQL to the mysql database server are over TLS.
Debug Details
We were able to capture the closing of the connection using tcpdump on one of the client machines. We observed the client sending a SELECT query to the proxy and then it responding back with a FIN to tear down the connection. Below is the output of the tcpdump capture from the client:
at the same time we captured the same transaction on the ProxySQL machine showing both front and backend behavior:
11:12:18.615084 enp129s0f0np0 In IP x.x.x.x.39514 > y.y.y.y.3306: Flags [P.], seq 558872:558940, ack 22869029, win 6929, options [nop,nop,TS val 2719213252 ecr 3656302283], length 68
11:12:18.615195 enp129s0f1np1 Out IP y.y.y.y.35198 > z.z.z.z.3306: Flags [P.], seq 324406:324503, ack 22458324, win 1749, options [nop,nop,TS val 2921728114 ecr 3557945900], length 97
11:12:18.615328 enp129s0f0np0 In IP z.z.z.z.3306 > y.y.y.y.35198: Flags [P.], seq 22458324:22458459, ack 324503, win 111, options [nop,nop,TS val 3557945901 ecr 2921728114], length 135
11:12:18.615472 enp129s0f1np1 Out IP y.y.y.y.35198 > z.z.z.z.3306: Flags [P.], seq 324503:324600, ack 22458459, win 1749, options [nop,nop,TS val 2921728115 ecr 3557945901], length 97
11:12:18.615595 enp129s0f0np0 In IP z.z.z.z.3306 > y.y.y.y.35198: Flags [P.], seq 22458459:22458599, ack 324600, win 111, options [nop,nop,TS val 3557945901 ecr 2921728115], length 140
11:12:18.616659 enp129s0f1np1 Out IP y.y.y.y.35198 > z.z.z.z.3306: Flags [R.], seq 324600, ack 22458599, win 1749, options [nop,nop,TS val 2921728116 ecr 3557945901], length 0
11:12:18.616739 enp129s0f0np0 Out IP y.y.y.y.3306 > x.x.x.x.39514: Flags [F.], seq 22869029, ack 558940, win 121, options [nop,nop,TS val 3656302286 ecr 2719213252], length 0
11:12:18.616830 enp129s0f0np0 In IP x.x.x.x.39514 > y.y.y.y.3306: Flags [F.], seq 558940, ack 22869030, win 6929, options [nop,nop,TS val 2719213253 ecr 3656302286], length 0
11:12:18.616839 enp129s0f0np0 Out IP y.y.y.y.3306 > x.x.x.x.39514: Flags [.], ack 558941, win 121, options [nop,nop,TS val 3656302286 ecr 2719213253], length 0
This shows the client's SELECT coming in on the frontend, but for some reason we see the proxy sending a reset to mysql and then tearing down the frontend connection.
The next step was to understand why proxy was sending the reset so I used the kernel's tcp reset tracepoint:
This shows that ma_tls_async_check_result() has ssl_err=1 which translates to SSL_ERROR_SSL. My next step here was to try and understand the error reason value since SSL_ERROR_SSL is a fairly generic error condition. Unfortunately we were unable to debug this farther in production, but came up with a lab reproducer to continue our investigation.
Reproducer
In two separate terminals we started the following in order:
Starting some client work going through the proxy. In this case the sysbench oltp_read_write test.
Start another connection which causes some type of SSL/TLS error to occur.
sysbench output:
$ sysbench --mysql-host=y.y.y.y --mysql-port=3306 --mysql-db=johunt_test_db --mysql-user=johunt --mysql-password='<redacted>' --tables=3 --table-size=10000 --threads=24 --max-time=300 /usr/share/sysbench/oltp_read_write.lua --report-interval=5 run
WARNING: --max-time is deprecated, use --time instead
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options:
Number of threads: 24
Report intermediate results every 5 second(s)
Initializing random number generator from current time
Initializing worker threads...
Threads started!
[ 5s ] thds: 24 tps: 4870.40 qps: 97455.67 (r/w/o: 68227.05/19483.22/9745.41) lat (ms,95%): 5.67 err/s: 0.20 reconn/s: 0.00
[ 10s ] thds: 24 tps: 4916.42 qps: 98336.12 (r/w/o: 68835.82/19666.86/9833.43) lat (ms,95%): 5.67 err/s: 0.20 reconn/s: 0.00
[ 15s ] thds: 24 tps: 4925.48 qps: 98514.75 (r/w/o: 68960.48/19703.71/9850.55) lat (ms,95%): 5.57 err/s: 0.20 reconn/s: 0.00
[ 20s ] thds: 24 tps: 4926.21 qps: 98536.94 (r/w/o: 68979.10/19704.43/9853.41) lat (ms,95%): 5.57 err/s: 0.80 reconn/s: 0.00
[ 25s ] thds: 24 tps: 4930.17 qps: 98615.23 (r/w/o: 69032.00/19722.29/9860.94) lat (ms,95%): 5.67 err/s: 0.40 reconn/s: 0.00
[ 30s ] thds: 24 tps: 4950.99 qps: 99030.39 (r/w/o: 69323.25/19804.36/9902.78) lat (ms,95%): 5.57 err/s: 0.60 reconn/s: 0.00
[ 35s ] thds: 24 tps: 4976.16 qps: 99518.71 (r/w/o: 69662.78/19903.42/9952.51) lat (ms,95%): 5.57 err/s: 0.20 reconn/s: 0.00
[ 40s ] thds: 24 tps: 4969.46 qps: 99397.71 (r/w/o: 69579.77/19878.62/9939.31) lat (ms,95%): 5.57 err/s: 0.40 reconn/s: 0.00
[ 45s ] thds: 24 tps: 4954.63 qps: 99102.21 (r/w/o: 69371.82/19820.52/9909.86) lat (ms,95%): 5.57 err/s: 0.60 reconn/s: 0.00
[ 50s ] thds: 24 tps: 4958.15 qps: 99160.48 (r/w/o: 69410.75/19833.22/9916.51) lat (ms,95%): 5.57 err/s: 0.60 reconn/s: 0.00
[ 55s ] thds: 24 tps: 4960.35 qps: 99230.20 (r/w/o: 69464.30/19844.60/9921.30) lat (ms,95%): 5.57 err/s: 0.40 reconn/s: 0.00
[ 60s ] thds: 24 tps: 4938.42 qps: 98774.89 (r/w/o: 69144.55/19753.30/9877.05) lat (ms,95%): 5.67 err/s: 0.40 reconn/s: 0.00
FATAL: mysql_stmt_execute() returned error 2013 (Lost connection to MySQL server during query) for query 'SELECT c FROM sbtest2 WHERE id BETWEEN ? AND ?'
FATAL: `thread_run' function failed: ./oltp_common.lua:432: SQL error, errno = 2013, state = 'HY000': Lost connection to MySQL server during query
FATAL: mysql_stmt_execute() returned error 2013 (Lost connection to MySQL server during query) for query 'SELECT c FROM sbtest2 WHERE id=?'
FATAL: `thread_run' function failed: ./oltp_common.lua:419: SQL error, errno = 2013, state = 'HY000': Lost connection to MySQL server during query
FATAL: mysql_stmt_execute() returned error 2013 (Lost connection to MySQL server during query) for query 'INSERT INTO sbtest1 (id, k, c, pad) VALUES (?, ?, ?, ?)'
FATAL: `thread_run' function failed: ./oltp_common.lua:488: SQL error, errno = 2013, state = 'HY000': Lost connection to MySQL server during query
Note that sysbench runs fine up until we run the bad-cert.py script below. Once that happens the sysbench connections get disconnected. Also, not only does this impact this run, but subsequent runs of sysbench after this fail to reconnect for some time. It usually takes 3-7 attempts of the sysbench command to get it to connect again even though we only run the bad-cert.py script once.
Simple script to force SSL error where foo.crt does not exist:
$ ./bad-cert.py
Traceback (most recent call last):
File "./bad-cert.py", line 4, in <module>
database = MySQLdb.connect(host='y.y.y.y', user='johunt', db='johunt_test_db', passwd='<redacted>', port=3306, ssl={'ca': '/home/johunt/foo.crt'})
File "/usr/lib/python3/dist-packages/MySQLdb/__init__.py", line 86, in Connect
return Connection(*args, **kwargs)
File "/usr/lib/python3/dist-packages/MySQLdb/connections.py", line 204, in __init__
super(Connection, self).__init__(*args, **kwargs2)
_mysql_exceptions.OperationalError: (2026, 'TLS/SSL error: Error while reading file.')
Below is output from /var/lib/proxysql/proxysql.log when sysbench is running and then I run the bad cert script:
2024-05-22 12:11:09 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:37886
2024-05-22 12:11:10 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029413) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1213, Deadlock found when trying to get lock; try restarting transaction . digest_text = "(null)"
2024-05-22 12:11:10 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:37898
2024-05-22 12:11:11 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:37910
2024-05-22 12:11:11 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029398) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1213, Deadlock found when trying to get lock; try restarting transaction . digest_text = "(null)"
2024-05-22 12:11:12 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:37916
2024-05-22 12:11:12 MySQL_Session.cpp:4423:handler_minus1_ClientLibraryError(): [ERROR] Detected a broken connection while running query on (0,z.z.z.z,3306,1029425) , FD (Conn:834 , MyDS:834) , user johunt , last_used 18707856816ms ago : 2013, Lost connection to MySQL server during query
2024-05-22 12:11:12 MySQL_Session.cpp:4423:handler_minus1_ClientLibraryError(): [ERROR] Detected a broken connection while running query on (0,z.z.z.z,3306,1029399) , FD (Conn:789 , MyDS:789) , user johunt , last_used 142023ms ago : 2013, Lost connection to MySQL server during query
2024-05-22 12:11:13 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:37922
2024-05-22 12:11:14 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:37926
Note the deadlock messages are seen even during normal operations and are not related to the bad cert.
and then when I try to restart sysbench we see:
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029426) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029421) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029411) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 mysql_connection.cpp:1727:handler(): [ERROR] Failed SET NAMES: Lost connection to MySQL server during query
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029391) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Session.cpp:2497:handler_again___status_CHANGING_CHARSET(): [ERROR] Detected a broken connection while during SET NAMES on (0,z.z.z.z,3306,1029394) , user johunt , last_used 163813ms ago : 2013, Lost connection to MySQL server during query
2024-05-22 12:13:56 mysql_connection.cpp:1727:handler(): [ERROR] Failed SET NAMES: Lost connection to MySQL server during query
2024-05-22 12:13:56 MySQL_Session.cpp:2497:handler_again___status_CHANGING_CHARSET(): [ERROR] Detected a broken connection while during SET NAMES on (0,z.z.z.z,3306,1029413) , user johunt , last_used 163812ms ago : 2013, Lost connection to MySQL server during query
2024-05-22 12:13:56 MySQL_Session.cpp:4423:handler_minus1_ClientLibraryError(): [ERROR] Detected a broken connection while running query on (0,z.z.z.z,3306,1029426) , FD (Conn:833 , MyDS:833) , user johunt , last_used 0ms ago : 2013, Lost connection to MySQL server during query
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029417) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 mysql_connection.cpp:1727:handler(): [ERROR] Failed SET NAMES: Lost connection to MySQL server during query
2024-05-22 12:13:56 MySQL_Session.cpp:2497:handler_again___status_CHANGING_CHARSET(): [ERROR] Detected a broken connection while during SET NAMES on (0,z.z.z.z,3306,1029423) , user johunt , last_used 163812ms ago : 2013, Lost connection to MySQL server during query
2024-05-22 12:13:56 MySQL_Session.cpp:4423:handler_minus1_ClientLibraryError(): [ERROR] Detected a broken connection while running query on (0,z.z.z.z,3306,1029391) , FD (Conn:827 , MyDS:827) , user johunt , last_used 18446744073709551ms ago : 2013, Lost connection to MySQL server during query
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029421) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Session.cpp:4423:handler_minus1_ClientLibraryError(): [ERROR] Detected a broken connection while running query on (0,z.z.z.z,3306,1029411) , FD (Conn:822 , MyDS:822) , user johunt , last_used 0ms ago : 2013, Lost connection to MySQL server during query
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029409) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Session.cpp:4423:handler_minus1_ClientLibraryError(): [ERROR] Detected a broken connection while running query on (0,z.z.z.z,3306,1029417) , FD (Conn:801 , MyDS:801) , user johunt , last_used 18446744073709551ms ago : 2013, Lost connection to MySQL server during query
2024-05-22 12:13:56 mysql_connection.cpp:1727:handler(): [ERROR] Failed SET NAMES: Lost connection to MySQL server during query
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029415) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Session.cpp:2497:handler_again___status_CHANGING_CHARSET(): [ERROR] Detected a broken connection while during SET NAMES on (0,z.z.z.z,3306,1029414) , user johunt , last_used 163815ms ago : 2013, Lost connection to MySQL server during query
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029412) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029404) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029421) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Session.cpp:4423:handler_minus1_ClientLibraryError(): [ERROR] Detected a broken connection while running query on (0,z.z.z.z,3306,1029409) , FD (Conn:810 , MyDS:810) , user johunt , last_used 18446744073709551ms ago : 2013, Lost connection to MySQL server during query
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029416) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029410) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029419) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029420) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029422) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029427) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029424) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Session.cpp:4480:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (0,z.z.z.z,3306,1029408) , user "johunt@a.b.c.d" , schema "johunt_test_db" , 1295, This command is not supported in the prepared statement protocol yet . digest_text = "BEGIN"
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36738
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36728
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36668
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36726
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36700
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36626
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36608
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36658
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36770
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36730
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36634
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36584
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36642
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36676
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36620
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36748
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36764
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36580
2024-05-22 12:13:56 MySQL_Thread.cpp:3939:process_all_sessions(): [WARNING] Closing unhealthy client connection a.b.c.d:36582
Using this reproducer I inserted a probe into SSL_get_error() line 17 to understand when it returns SSL_ERROR_SSL the open ssl error value and reason.
# perf probe -x $(which proxysql) -L SSL_get_error
<SSL_get_error@/proxysql2-2.5.5/deps/libssl/openssl/ssl/ssl_lib.c:0>
0 int SSL_get_error(const SSL *s, int i)
{
int reason;
unsigned long l;
BIO *bio;
6 if (i > 0)
7 return SSL_ERROR_NONE;
/*
* Make things return SSL_ERROR_SYSCALL when doing SSL_do_handshake etc,
* where we do encode the error
*/
13 if ((l = ERR_peek_error()) != 0) {
14 if (ERR_GET_LIB(l) == ERR_LIB_SYS)
15 return SSL_ERROR_SYSCALL;
else
17 return SSL_ERROR_SSL;
perf data collected with the probe during lab reproducer
Here we see the SSL_shutdown() calls are coming from:
MySQL_Data_Stream::~MySQL_Data_Stream
MySQL_Data_Stream::shut_hard
ma_tls_close()
I've put together a PR https://github.com/sysown/proxysql/pull/4555 which checks for an error returned by SSL_shutdown() and clears the error queue in the first 2 call-sites above. This appears to fix my lab test case. I'm still investigating the 3rd site to understand why leaving it unpatched does not seem to matter.
This appears to be a fairly serious problem given how much a single bad client can impact a # of clients using the proxy.
There may be other places in the code that have similar problems where SSL errors are set and not properly checked and cleared. I'm wondering if it makes sense to update the mariadb client lib openssl calls to clear the error queue prior to using them. I've found a few different reports stating this is recommended practice when using that API. Here's on example:
Version Information
Problem Description
A # of clients being routed through ProxySQL reported seeing "Lost connection to MySQL server during query" errors periodically. These client connections are connecting to ProxySQL using both TLS and clear-text. Connections from ProxySQL to the mysql database server are over TLS.
Debug Details
We were able to capture the closing of the connection using tcpdump on one of the client machines. We observed the client sending a SELECT query to the proxy and then it responding back with a FIN to tear down the connection. Below is the output of the tcpdump capture from the client:
Client: x.x.x.x ProxySQL: y.y.y.y MySQL: z.z.z.z
at the same time we captured the same transaction on the ProxySQL machine showing both front and backend behavior:
This shows the client's SELECT coming in on the frontend, but for some reason we see the proxy sending a reset to mysql and then tearing down the frontend connection.
The next step was to understand why proxy was sending the reset so I used the kernel's tcp reset tracepoint:
We can see here that
ma_net_safe_read()
is callingend_server()
which calls the close and forces the kernel to send the RST.I then added a # of userspace probes to understand how we got into this situation and captured the following:
This shows that
ma_tls_async_check_result()
hasssl_err=1
which translates toSSL_ERROR_SSL
. My next step here was to try and understand the error reason value since SSL_ERROR_SSL is a fairly generic error condition. Unfortunately we were unable to debug this farther in production, but came up with a lab reproducer to continue our investigation.Reproducer
In two separate terminals we started the following in order:
sysbench output:
Note that sysbench runs fine up until we run the
bad-cert.py
script below. Once that happens the sysbench connections get disconnected. Also, not only does this impact this run, but subsequent runs of sysbench after this fail to reconnect for some time. It usually takes 3-7 attempts of the sysbench command to get it to connect again even though we only run thebad-cert.py
script once.Simple script to force SSL error where
foo.crt
does not exist:Script output:
Below is output from /var/lib/proxysql/proxysql.log when sysbench is running and then I run the bad cert script:
Note the deadlock messages are seen even during normal operations and are not related to the bad cert.
and then when I try to restart sysbench we see:
Using this reproducer I inserted a probe into
SSL_get_error()
line 17 to understand when it returnsSSL_ERROR_SSL
the open ssl error value and reason.perf data collected with the probe during lab reproducer
The
l
value here is0xa000197
which corresponds to an error value of0xa
:and a reason value of
0x197
(407):The only place this reason is set is in
SSL_shutdown()
Probing that line while running the reproducer shows the following hits:
Call graph of the callers of
SSL_shutdown()
:Here we see the
SSL_shutdown()
calls are coming from:MySQL_Data_Stream::~MySQL_Data_Stream
MySQL_Data_Stream::shut_hard
ma_tls_close()
I've put together a PR https://github.com/sysown/proxysql/pull/4555 which checks for an error returned by
SSL_shutdown()
and clears the error queue in the first 2 call-sites above. This appears to fix my lab test case. I'm still investigating the 3rd site to understand why leaving it unpatched does not seem to matter.This appears to be a fairly serious problem given how much a single bad client can impact a # of clients using the proxy.
There may be other places in the code that have similar problems where SSL errors are set and not properly checked and cleared. I'm wondering if it makes sense to update the mariadb client lib openssl calls to clear the error queue prior to using them. I've found a few different reports stating this is recommended practice when using that API. Here's on example: