sysown / proxysql

High-performance MySQL proxy with a GPL license.
http://www.proxysql.com
GNU General Public License v3.0
6.05k stars 983 forks source link

Downtime the app because of Shunning the writer server #4651

Closed resworld closed 2 months ago

resworld commented 2 months ago

Hi team,

We have faced an issue with the proxysql that caused a downtime for our site. It seems that the proxysql marked our writer server as shunned and all the queries has been rejected that way. We have noticed that, just before that to happened, it has "timeout error" during COM_CHANGE_USER. We have migrated to proxysql 2.6.3 version а week ago and this error is something new for us. The application is using a lot of different MySQL users (more than 10k) that are still with mysql_native_password method for authentication and our proxysql's default plugin is set to mysql_native_password as well. Problem appear just once, a day after the migration, and after that it doesn't happen again. Can you, please, help us to understand what it could be happened and how can we prevent it from happening in the future?

We are using proxysql admin instances deployed in AWS and proxysql satellites running inside the EKS cluster. Both are using same proxysql version:

ProxySQL version 2.6.3-percona-1.1, codename Truls

OS version:

AlmaLinux release 9.4 (Seafoam Ocelot)

Steps to reproduce:

We are not sure what caused that issue and how to reproduce it.

Logs from the Satellite:

2024-09-13 03:44:06 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:06 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:07 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:08 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:08 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:09 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:09 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:09 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:10 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:10 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:10 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:10 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:10 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:11 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:11 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:11 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:11 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:11 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:12 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:12 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:12 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:12 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:12 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:13 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:13 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:13 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:13 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:13 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:13 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:13 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:14 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:14 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:14 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:14 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:14 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:14 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:15 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:15 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:15 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:15 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:15 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:15 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:16 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:16 MySQL_Session.cpp:3440:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on 192.168.10.101 , 3306
2024-09-13 03:44:16 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 508us
2024-09-13 03:44:16 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 103us
2024-09-13 03:44:16 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 646us
2024-09-13 03:44:16 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 262us
2024-09-13 03:44:16 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 1us
2024-09-13 03:44:16 MySrvC.cpp:127:connect_error(): [ERROR] Shunning server 192.168.10.101:3306 with 5 errors/sec. Shunning for 10 seconds
2024-09-13 03:44:16 MyHGC.cpp:227:get_random_MySrvC(): [ERROR] Hostgroup 10 has no servers available! Checking servers shunned for more than 1 second
2024-09-13 03:44:16 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 2us
2024-09-13 03:44:16 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 2us
2024-09-13 03:44:17 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 129us
2024-09-13 03:44:17 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 1us
2024-09-13 03:44:17 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 2us
2024-09-13 03:44:17 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 1us
2024-09-13 03:44:17 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 943us
2024-09-13 03:44:18 MyHGC.cpp:227:get_random_MySrvC(): [ERROR] Hostgroup 10 has no servers available! Checking servers shunned for more than 1 second
2024-09-13 03:44:18 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 443us
2024-09-13 03:44:18 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 1us
2024-09-13 03:44:18 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 8005us
2024-09-13 03:44:18 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 5807us
2024-09-13 03:44:18 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 5070us
2024-09-13 03:44:18 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 811us
2024-09-13 03:44:18 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 811054us
2024-09-13 03:44:19 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 159us
2024-09-13 03:44:19 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 1894394us
2024-09-13 03:44:19 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 220237us
2024-09-13 03:44:19 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 2us
2024-09-13 03:44:19 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 1us
2024-09-13 03:44:19 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 0us
2024-09-13 03:44:20 MyHGC.cpp:227:get_random_MySrvC(): [ERROR] Hostgroup 10 has no servers available! Checking servers shunned for more than 1 second
2024-09-13 03:44:20 MySQL_Session.cpp:3209:handler_again___status_CONNECTING_SERVER(): [ERROR] Max connect timeout reached while reaching hostgroup 10 after 16001ms . HG status: [{"Bytes_recv":"7292277899","Bytes_sent":"22429119254","ConnERR":"25","ConnFree":"33","ConnOK":"119","ConnUsed":"18","Latency_us":"0","MaxConnUsed":"58","Queries":"59860818","Queries_GTID_sync":"0","hostgroup":"10","srv_host":"192.168.10.101","srv_port":"3306","status":"SHUNNED"}]
2024-09-13 03:44:20 mysql_connection.cpp:1279:handler(): [ERROR] Connect timeout on 192.168.10.101:3306 : exceeded by 502580us
2024-09-13 03:44:20 MySQL_Session.cpp:3209:handler_again___status_CONNECTING_SERVER(): [ERROR] Max connect timeout reached while reaching hostgroup 10 after 13551ms . HG status: [{"Bytes_recv":"7292277899","Bytes_sent":"22429119254","ConnERR":"26","ConnFree":"33","ConnOK":"119","ConnUsed":"17","Latency_us":"0","MaxConnUsed":"58","Queries":"59860818","Queries_GTID_sync":"0","hostgroup":"10","srv_host":"192.168.10.101","srv_port":"3306","status":"SHUNNED"}]

Please let us know if something as information would be needed! We will like to understand what could be configured wrong or how to prevent from happening in future.

Thank you!

renecannao commented 2 months ago

Hi @resworld . If connections and COM_CHANGE_USER all started timeout, this issue is likely related to a backend issue. The app has a downtime because proxysql shunned the backend, and proxysql shunned the backend because the backend was timing out both the creation of new connections and COM_CHANGE_USER

resworld commented 2 months ago

Hi @renecannao and thank you for the fast reply,

We don't see anything in the backend logs in that certain time. We will dig deeper in the galera and try to find the root of this issue. Thank you once again!