Ahoy,
I've been having some issues with MHA getting it to fully work in an
environment to provide a functional failover system.
Basically there are (in a testing environment) four machines, 1 MHA manager
(MGR), 1 MySQL master (MST), 2 MySQL slaves (SLV1 & 2).
Once I shut down MySQL on the master server, the failover kicks in immediately
and very fast, which is great.
Up on the completion of the failover, one of the slaves is recognized as the
new master (SLV1) , and SLV2 now
recognizes the new master as well. At this point MHA exits. Is that intended?
From now on, when I want to restart MHA with the current setup post-failover,
it will immediately exit requiring all machines to be started as per config
file.
Using the remove_dead_from_config option alleviates this a bit, but once I want
to bring up the original MySQL server, that one's of course missing and will
require
readding back to the conf file.
When starting up the original MySQL master, MST, and leaving it as it is
intending to get it set back as master for the cluster, MHA will
fail to start up due to having two non-slave servers alive.
Using the masterha_master_switch command to run a manual failover passing the
options to set MST back again as master, everytime SLV1 will fail to properly
recognize MST as
Master again and will have error messages (usually connection errors) when
running SHOW SLAVE STATUS \G, while SLV2 will replicate without a problem.
In the case of using a machine with a VIP, how can this be specified in the
masterha.cnf file?
Listing each machine in the .cnf file per regular hostname I get an error as
the hostname in the file does not match the hostname/ip in the slave settings
on the MySQL slaves.
Replacing the hostname with the vip-name, how will this work in a failover to
get the VIP-name functional with the new master without mysql or MHA dying?
Basically in the long run, is there a way to use MHA in a fully automated way
where human input would be as good as not needed at all?
Will try to get some log file outputs attached soon, they've been nuked
currently for some other testing...
Cheers :)
Achim
Original issue reported on code.google.com by achim.re...@rightster.com on 14 Apr 2014 at 5:22
Original issue reported on code.google.com by
achim.re...@rightster.com
on 14 Apr 2014 at 5:22