sysown / proxysql

High-performance MySQL proxy with a GPL license.
http://www.proxysql.com
GNU General Public License v3.0
5.97k stars 973 forks source link

proxysql connection pool status is OFFLINE_HARD without refresh automatically #2599

Open biaoyun opened 4 years ago

biaoyun commented 4 years ago

Environment configuration: OS: ubuntu 16.04 ProxySQL version 1.4.13-15-g69d4207, codename Truls mysql version 5.7.24-log

problem description: Always found proxysql_connection_pool_status is OFFLINE_HARD by percona proxysql exporter,The status does not return to normal until excute "select * from runtime_mysql_servers" or "load mysql_servers to runtime", This kind of thing always happens in Mysql Group Replication,not MySQL Replication

Has anyone encountered this problem?

renecannao commented 4 years ago

An example?

biaoyun commented 4 years ago

2020-03-12 02:16:17 MySQL_HostGroups_Manager.cpp:602:commit(): [WARNING] Removed server at address 139732884129920, hostgroup 1, address 192.168.1.156 port 3310. Setting status OFFLINE HARD and immediately dropping all free connections. Used connections will be dropped when trying to use them

root@data_ops_01:~# curl -X POST http://192.168.3.216:9502/metrics |grep proxysql_connection_pool_status HELP proxysql_connection_pool_status The status of the backend server (1 - ONLINE, 2 - SHUNNED, 3 - OFFLINE_SOFT, 4 - OFFLINE_HARD). TYPE proxysql_connection_pool_status gauge proxysql_connection_pool_status{endpoint="192.168.1.156:3310",hostgroup="1"} 4 proxysql_connection_pool_status{endpoint="192.168.1.157:3310",hostgroup="3"} 1 proxysql_connection_pool_status{endpoint="192.168.1.157:3310",hostgroup="2"} 1 proxysql_connection_pool_status{endpoint="192.168.1.159:3310",hostgroup="3"} 1 roxysql_connection_pool_status{endpoint="192.168.1.159:3310",hostgroup="1"} 4 proxysql_connection_pool_status{endpoint="192.168.1.159:3310",hostgroup="3"} 1

This alarm continues until the command("select * from runtime_mysql_servers" or "load mysql_servers to runtime") is executed manually at 10 am

at the same time,the result of "select from stats_mysql_connection_pool;" as follows: ProxySQLAdmin> select from runtime_mysql_servers; +--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+ | hostgroup_id | hostname | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment | +--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+ | 2 | 192.168.1.157 | 3310 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | | | 3 | 192.168.1.159 | 3310 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | | | 3 | 192.168.1.157 | 3310 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | | | 3 | 192.168.1.156 | 3310 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | | +--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+ 4 rows in set (0.00 sec)

and Monitoring value changed, like this

HELP proxysql_connection_pool_status The status of the backend server (1 - ONLINE, 2 - SHUNNED, 3 - OFFLINE_SOFT, 4 - OFFLINE_HARD)

TYPE proxysql_connection_pool_status gauge

proxysql_connection_pool_status{endpoint="192.168.1.156:3310",hostgroup="3"} 1 proxysql_connection_pool_status{endpoint="192.168.1.157:3310",hostgroup="2"} 1 proxysql_connection_pool_status{endpoint="192.168.1.157:3310",hostgroup="3"} 1 proxysql_connection_pool_status{endpoint="192.168.1.159:3310",hostgroup="3"} 1

No configuration changes were made throughout the process, many of the above examples are in my system

renecannao commented 4 years ago

This sounds an expected behavior: the server was removed, but it is still visible in stats_mysql_connection_pool until a runtime_mysql_servers is queried or load mysql servers to runtime is executed. The reason why the server is still visible in stats_mysql_connection_pool is to not lose metrics related to connections, queries, etc.

Considering that in proxysql a server with status OFFLINE_HARD is equivalent to a server not existing or deleted (see wiki, I am think this should be consider a bug in the system that generates an alarm. I think an alarm shouldn't be generated for OFFLINE_HARD , because an alarm isn't generated for a server that is deleted

biaoyun commented 4 years ago

In my case, I think that the MGR architecture may have high requirements on the network, which caused the mysql server to be disconnected at a certain moment, but it recovered afterwards. However, when mysql server status returned to normal, proxysql did not refresh the mysql server status.

With a discussion attitude,I think an alarm should be generated for OFFLINE_HARD,because when someone is marked as OFFLINE_HARD(although only for a short time),Means that this server did have a problem for some time and I think this kind of problem also needs to be understood by the administrator

So,Is there any way to refresh the mysql server OFFLINE_HARD regularly?

renecannao commented 4 years ago
proxysql_connection_pool_status{endpoint="192.168.1.159:3310",hostgroup="3"} 1
proxysql_connection_pool_status{endpoint="192.168.1.159:3310",hostgroup="1"} 4

proxysql is very well aware that the server is online in hostgroup 3 , but it is not anymore in hostgroup 1, thus marked as OFFLINE_HARD . The reason why the server is still visible in mysql_connection_pool was described before: "to not lose metrics related to connections, queries, etc." If the server disappear immediately, how would you know if it served any traffic while in hostgroup 1?

With a discussion attitude,I think an alarm should be generated for OFFLINE_HARD,because when someone is marked as OFFLINE_HARD(although only for a short time),Means that this server did have a problem for some time and I think this kind of problem also needs to be understood by the administrator

I agree and disagree at the same time. If the server went to OFFLINE_HARD (in this case it actually means it recovered from a lag) the administrator is notified. But on the other hard, should you get an alert if the same server is ONLINE in hostgroup 3? It seems the alerting logic isn't tuned correctly. Should you get an alert for a server in OFFLINE_HARD in hostgroup 1 (offline_hostgroup I assume) while the same server is ONLINE in hostgroup 3 (reader_hostgroup I assume)?

biaoyun commented 4 years ago

I may understand what you mean Maybe it is more reasoneable to generate an alert when a server is marked as OFFLINE_HARD only(This is not effect if a server marked both OFFLINE_HARD and ONLINE) I will try to modify the alert logic to make it more reasoneable

Thank you very much for you answer!