Closed Esysteme closed 5 months ago
I made more time to write this post than to make the script lol :D
Hey @Esysteme, can you tell me what software you use to make these diagrams?
@MichalisDBA it's sub part of PmaControl (my own software) this view will be available when my version 4.0 will be ready.
send me your email in PM
Maybe it's useless, but I would to try the scheduler ! In the cool stuff, I don't like the configuration at all, that's why imagined that.
First there our situation initial situation :
1 proxysql 1 master with 2 slaves
[by the way, I don't put 10.68.68.18 in slave hostgroup, I guess when my hg=2 was empty, but when the back ProxySQL should remove him from hg=2 ?]
Now we switch off a slave :
Node got shunned from ProxySQL, all good there !
Now we restart the node 3
Of course the server, don't have READ_ONLY setting properly, and here it's the disaster ! So as specified in documentation ProxySQL set him in WRITER hostgroup
and of course what should happen, happened :
Here a surprise, my application detected more faster the broken link, than ProxySQL, I get information from all servers to each 10 seconds, 1 second for ProxySQL. (I need to investigate to be sure otherwise it's mean ProxySQL, send around and avg ~5 seconds of query to the wrong hostgroup).
after course after :
Here a miss on my side => need to add on explanation a broken replication of course. I forgot to show you the message we got on replication :
Let's fix the replication :
And there again the same, but at this point I prefer that the proxy take his time to be sure to put back the server online :
Well, I got a disater with this point 2 years, ago where the broken slave keep continue to receive write, even if it's stay maybe 10 sec, it's cost more than one week to try to fix everything.
By the way my script (to install complete M/S and ProxySQL) is used still there : https://github.com/PmaControl/Toolkit/blob/master/install-topology-master-slave.sh (I probably made a mistake there)
For me, lost READ_ONLY flag with a reboot, is a mistake. If someone got a use case please let me comment there !
A simple task to prevent this case it's just to control UPTIME with READ_ONLY. Yesterday evening instead of watch a movie, i developed a short script to fix (Or to try ) it !
So if we got this :
and after we got this :
if
We need to execute :
To be more simple to monitor I created a new table in main :
Now let do the script who will make it for us : filename : pmacontrol_proxysql_keep_read_only.sh (to move in /var/lib/proxysql)
What we go in this script :
We got credential to connect on backend server
We go config for mysql_replication_hostgroups only (not interested by other kind of server
We move uptime_now of last run to uptime_previous (I would to day we don't use date, i should remove but I was lazy)
Important thing, in all request to mysql we need a really shot timeout (here 1 second, but lower will be better)
We need to get all backend we have to follow, here is simple we need to get all backend who are in HOSTGROUP_READER but not in HOSTGROUP_READER
We import these lines in our table (in fact there we could make it at end to get a micro optimization, we could save an extra query for all run each time we add a new server there :D)
then for each line in our table we get new value for READ_ONLY and UPTIME, and we update our table with there value.
Now we are able to detect all server who was restarted with their READ_ONLY been updated
For all these lines we update the READ_ONLY flag :
To finish we need to remove, all read_only who was updated without a restart of MySQL.
After this script can be improved with store date in file, and save 4 queries + (number of node to follow) * 2, there 8 queries. And other case will be to add the condition to make it only where uptime not more than 60 secondes. (we never know if ProxySQL was stopped and someone decided to start it after a long time, after a reboot of one MySQL server) => I just added in script.
Now let's try it !
well it's work as expected :
In 80% we got this : (even not detected by ProxySQL)
But in some case when I guess the connexion got stun with timeout, ProxySQL detect the switch of READ_ONLY before us (I guess ProxySQL made it each 3 seconds)
Look about tag : [KEEP_READ_ONLY_AFTER_REBOOT]
It's not often, but of course it's a problem we miss 2 seconds, it still exist a small windows where the problem can happen, let's try to make a test, maybe ProxySQL have some internal delay (I should open source code, I will check it tomorow)
Here my test : open 10 simultaneous connexions and generate an insert avg each second (0.5~1.5) on same table.
Now let's produce random start and stop of MySQL on slave, let it's for all night we will see on morning if we arrive to broke the replication.
we have 2 cases :
So for the moment this solution is a fail :(, in high concurrent database ! (we could execute this script more often but i am not sure it's a good solution at all). The best should be to add it directly inside of ProxySQL as default, and propose an option to deactivate it ?
Maybe one of you got an idea ?