Open mboruta1 opened 3 years ago
We recently experienced a production outage due to this issue. It's a bit discouraging to see no progress in 2.5 years.
If max_writers=1
, it means you want only 1 server to take writes.
This means that all proxysql nodes need to use only 1 server in the writer hostgroup : the same server.
If the backend server that is supposed to be the writer rejects one specific proxysql, this proxysql instance shouldn't consider another node as the writer while the rest of the proxysql instances are writing to the first node.
And the fact that the backend is blocking one proxysql instance, doesn't mean that the backend is unhealthy.
If all proxysql instances send write to one backend while the blocked proxysql instance sends writes to a different server, users would complains that proxysql instances are using two writers... The chosen writer should be the same for all the proxies, no matter if you run I proxysql instance (I hope this is not case!!) or thousands proxysql instances.
Few follow up question:
max_connect_errors
on the backends?This is not a bug. ProxySQL is working as configured. Therefore you won't see any progress in this because both ProxySQL and MySQL behaved the way they have been configured to.
@renecannao After 4 years of using ProxySQL, I encountered this issue today.
I have three ProxySQL instances running on the same machines as MySQL, with one master and two slaves.
These instances are under a load balancer with rules to avoid resolving the ProxySQL IP if it is unhealthy (e.g., server is down or other basic checks).
Today's incident was unusual and the first of its kind for me. I saw many "blocked because of many connection errors..." messages. The problem was that the load balancer didn't mark this node as unhealthy, leading to a flood of connection errors.
The max_connect_errors
value is set to 100.
What would you recommend as a proper solution for this problem? I plan to improve the load balancer (DNS) to consider this new condition but would also like to address it from the MySQL side. What do you think?
Bug Description
When ProxySQL detects that a host group does not have an online node, it attempts to bring online a shunned server (relevant code section). It does this by cycling through all hosts in the host group and attempting to bring online nodes that are currently SHUNNED and have
shunned_automatic
set totrue
.However, as per ProxySQL's Galera configuration documentation, if
mysql_galera_hostgroups.max_writers
is set to less than the total number of online backends available, the backends with the lowest priority are placed in the writer host group in SHUNNED state. When this happens, the server'sshunned_automatic
field is not set to true, and thus these servers are never attempted to be brought online in the code above.Thus, when ProxySQL is configured with a maximum of 1 writer, and the backend currently designated as the writer can no longer be connected to, the backend is shunned and only the said backend is attempted to be brought online via the code linked to above. Until this backend is reachable all client write queries will be blocked, even though there may be perfectly good backends that can assume the role of writer.
ProxySQL Version
Reproduced with 2.2.0 (amd64, retrieved here and 2.2.2 (compiled from source).
OS Version
Ubuntu 18.04
Steps to Reproduce
Install one of the above ProxySQL versions
Configure ProxySQL to talk with a 3 node, multi-master galera cluster.
mysql_galera_hostgroups
can look like the following:runtime_mysql_servers
should look like this:Induce a connection failure on the writer backend. The way I did this was to cause the IP of the server on which ProxySQL was running to be blocked (see this), specifically by overwhelming the backend by running several sysbench benchmarks simultaneously:
Eventually ProxySQL will time out attempting to establish backend connections to the backend (backend is too busy processing queries), will close over 100 unacknowledged connection requests, and thus trigger mysql's protection mechanism, resulting in the server being effectively blacklisted until
mysqladmin flush-hosts
is called.Observe that now all servers in host group 1 are shunned:
ProxySQL Error Log
The lines without
Connect timeout on
keep on repeating untilmysqladmin flush-hosts
is called on the relevant backend.