sysown / proxysql

High-performance MySQL proxy with a GPL license.
http://www.proxysql.com
GNU General Public License v3.0
6.05k stars 983 forks source link

Unexpected behavior for Galera hosts and write switching #4702

Closed uwmixa closed 1 month ago

uwmixa commented 1 month ago

Hi, I'm experiencing unexpected behavior for Galera hosts and write switching

Galera Hostgroups

MySQL [(none)]> select * from mysql_galera_hostgroups;
+------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+
| writer_hostgroup | backup_writer_hostgroup | reader_hostgroup | offline_hostgroup | active | max_writers | writer_is_also_reader | max_transactions_behind | comment |
+------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+
| 10               | 11                      | 20               | 99                | 1      | 1           | 1                     | 50                      |         |
+------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+
1 row in set (0.000 sec)

MySQL Servers

MySQL [(none)]> select * from mysql_servers order by hostgroup_id, hostname;
+--------------+--------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname     | port | gtid_port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+--------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | 10.10.20.9  | 3306 | 0         | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | 10.10.20.15 | 3306 | 0         | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.10.20.8  | 3306 | 0         | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 99           | 10.10.20.5  | 3306 | 0         | OFFLINE_HARD | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+--------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
4 rows in set (0.000 sec)

Runtime MySQL Servers

MySQL [(none)]> select * from runtime_mysql_servers order by hostgroup_id, hostname;
+--------------+--------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname     | port | gtid_port | status  | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+--------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | 10.10.20.15 | 3306 | 0         | SHUNNED | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | 10.10.20.8  | 3306 | 0         | SHUNNED | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | 10.10.20.9  | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | 10.10.20.15 | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | 10.10.20.8  | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.10.20.15 | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.10.20.8  | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.10.20.9  | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+--------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
8 rows in set (0.001 sec)

Runtime MySQL QUery Rules:

MySQL [(none)]> select rule_id, active, match_pattern, destination_hostgroup from runtime_mysql_query_rules order by destination_hostgroup; 
+---------+--------+-----------------------+-----------------------+
| rule_id | active | match_pattern         | destination_hostgroup |
+---------+--------+-----------------------+-----------------------+
| 91      | 1      | ^\/\*WRITE*\/         | 10                    |
| 100     | 1      | ^SELECT .* FOR UPDATE | 10                    |
| 300     | 1      | .*                    | 10                    |
| 92      | 1      | ^\/\*READ\*\/         | 20                    |
| 200     | 1      | ^SELECT .*            | 20                    |
| 201     | 1      | ^\(                   | 20                    |
+---------+--------+-----------------------+-----------------------+
6 rows in set (0.000 sec)

The node 10.10.20.15 was added to Cluster as replacement for 10.10.20.5 and 10.10.20.15 suppossed to be a backup writer.

The problem is

when I set current writer 10.10.20.9 to OFFLINE_SOFT it is still acting as writer and no switchover happened

-- update status
MySQL [(none)]> UPDATE mysql_servers SET status='OFFLINE_SOFT' WHERE hostname='10.10.20.9';
Query OK, 1 row affected (0.000 sec)

-- check servers
MySQL [(none)]> SELECT * FROM mysql_servers;
+--------------+--------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname     | port | gtid_port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+--------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 99           | 10.10.20.5  | 3306 | 0         | OFFLINE_HARD | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.10.20.8  | 3306 | 0         | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | 10.10.20.9  | 3306 | 0         | OFFLINE_SOFT | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | 10.10.20.15 | 3306 | 0         | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+--------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
4 rows in set (0.000 sec)

-- load to runtime
MySQL [(none)]> LOAD MYSQL SERVERS TO RUNTIME;
Query OK, 0 rows affected (0.003 sec)

waiting 10 sec and checking runtime servers

MySQL [(none)]> SELECT * FROM runtime_mysql_servers;
+--------------+--------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname     | port | gtid_port | status  | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+--------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | 10.10.20.15 | 3306 | 0         | SHUNNED | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | 10.10.20.8  | 3306 | 0         | SHUNNED | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | 10.10.20.9  | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | 10.10.20.15 | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | 10.10.20.8  | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.10.20.15 | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.10.20.8  | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.10.20.9  | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+--------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
8 rows in set (0.001 sec)

Why the 10.10.20.15 isn't becoming a writer?

catyellow commented 1 month ago

Looks like the weight has more priority comparing to static configuration

This wasn't clear from schema reading

DOC
In reality the process of moving a node to the writer_hostgroup is not as simple as stated in the diagram.

When a new node is identified as a writer, a process for selecting the new writers takes place. This means that the node that joins is not guarantee to be a writer, or any of the nodes that previously were writers are guarantee to keep being online, after a new writer joins.

This is determined by a fixed order rule composed by: "weight + hostname + port". If a node that was previously a writer is going to be replaced by a new joined node, the previous will be kept in the writer_hostgroup, but set as SHUNNED.

But I tried changes in weight + hostname + port and set weight to 10 helped

MySQL [(none)]> select * from mysql_servers;
+--------------+--------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname     | port | gtid_port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+--------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 99           | 10.10.20.5  | 3306 | 0         | OFFLINE_HARD | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.10.20.8  | 3306 | 0         | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | 10.10.20.9  | 3306 | 0         | OFFLINE_SOFT | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | 10.10.20.15 | 3306 | 0         | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+--------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
4 rows in set (0.000 sec)

MySQL [(none)]> select * from runtime_mysql_servers;
+--------------+--------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname     | port | gtid_port | status  | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+--------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | 10.10.20.15 | 3306 | 0         | SHUNNED | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | 10.10.20.8  | 3306 | 0         | SHUNNED | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | 10.10.20.9  | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | 10.10.20.15 | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | 10.10.20.8  | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.10.20.15 | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.10.20.8  | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.10.20.9  | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+--------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
8 rows in set (0.001 sec)

MySQL [(none)]> update mysql_servers set weight=10 where hostname='10.10.20.15';
Query OK, 1 row affected (0.000 sec)

MySQL [(none)]> LOAD MYSQL SERVERS TO RUNTIME;
Query OK, 0 rows affected (0.003 sec)

MySQL [(none)]> select * from mysql_servers;
+--------------+--------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname     | port | gtid_port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+--------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 99           | 10.10.20.5  | 3306 | 0         | OFFLINE_HARD | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.10.20.8  | 3306 | 0         | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | 10.10.20.9  | 3306 | 0         | OFFLINE_SOFT | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | 10.10.20.15 | 3306 | 0         | ONLINE       | 10     | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+--------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
4 rows in set (0.000 sec)

MySQL [(none)]> select * from runtime_mysql_servers;
+--------------+--------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname     | port | gtid_port | status  | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+--------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | 10.10.20.15 | 3306 | 0         | ONLINE  | 10     | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | 10.10.20.8  | 3306 | 0         | SHUNNED | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | 10.10.20.9  | 3306 | 0         | SHUNNED | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | 10.10.20.8  | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 11           | 10.10.20.9  | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.10.20.8  | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.10.20.9  | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+--------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
7 rows in set (0.001 sec)