yoshinorim / mha4mysql-manager

Development tree of Master High Availability Manager and tools for MySQL (MHA), Manager part
http://code.google.com/p/mysql-master-ha/
GNU General Public License v2.0
1.46k stars 501 forks source link

Simulate the main instance crash,hang ssh #125

Open trsenzhang opened 5 years ago

trsenzhang commented 5 years ago

1.ssh 检查ok [root@trsen184 masterha]# masterha_check_ssh --conf=/usr/local/masterha/masterha_mha1.cnf Tue Jul 30 11:57:32 2019 - [info] Reading default configuration from /etc/masterha_default.cnf.. Tue Jul 30 11:57:32 2019 - [info] Reading application default configuration from /usr/local/masterha/masterha_mha1.cnf.. Tue Jul 30 11:57:32 2019 - [info] Reading server configuration from /usr/local/masterha/masterha_mha1.cnf.. Tue Jul 30 11:57:32 2019 - [info] Starting SSH connection tests.. Tue Jul 30 11:57:33 2019 - [debug] Tue Jul 30 11:57:32 2019 - [debug] Connecting via SSH from root@172.18.0.181(172.18.0.181:22) to root@172.18.0.182(172.18.0.182:22).. Tue Jul 30 11:57:32 2019 - [debug] ok. Tue Jul 30 11:57:32 2019 - [debug] Connecting via SSH from root@172.18.0.181(172.18.0.181:22) to root@172.18.0.183(172.18.0.183:22).. Tue Jul 30 11:57:33 2019 - [debug] ok. Tue Jul 30 11:57:34 2019 - [debug] Tue Jul 30 11:57:33 2019 - [debug] Connecting via SSH from root@172.18.0.182(172.18.0.182:22) to root@172.18.0.181(172.18.0.181:22).. Tue Jul 30 11:57:33 2019 - [debug] ok. Tue Jul 30 11:57:33 2019 - [debug] Connecting via SSH from root@172.18.0.182(172.18.0.182:22) to root@172.18.0.183(172.18.0.183:22).. Tue Jul 30 11:57:33 2019 - [debug] ok. Tue Jul 30 11:57:35 2019 - [debug] Tue Jul 30 11:57:33 2019 - [debug] Connecting via SSH from root@172.18.0.183(172.18.0.183:22) to root@172.18.0.181(172.18.0.181:22).. Tue Jul 30 11:57:33 2019 - [debug] ok. Tue Jul 30 11:57:33 2019 - [debug] Connecting via SSH from root@172.18.0.183(172.18.0.183:22) to root@172.18.0.182(172.18.0.182:22).. Tue Jul 30 11:57:34 2019 - [debug] ok. Tue Jul 30 11:57:35 2019 - [info] All SSH connection tests passed successfully.

2.repl检查 ok [root@trsen184 masterha]# masterha_check_repl --conf=/usr/local/masterha/masterha_mha1.cnf Tue Jul 30 12:01:28 2019 - [info] Reading default configuration from /etc/masterha_default.cnf.. Tue Jul 30 12:01:28 2019 - [info] Reading application default configuration from /usr/local/masterha/masterha_mha1.cnf.. Tue Jul 30 12:01:28 2019 - [info] Reading server configuration from /usr/local/masterha/masterha_mha1.cnf.. Tue Jul 30 12:01:28 2019 - [info] MHA::MasterMonitor version 0.58. Tue Jul 30 12:01:29 2019 - [info] GTID failover mode = 1 Tue Jul 30 12:01:29 2019 - [info] Dead Servers: Tue Jul 30 12:01:29 2019 - [info] Alive Servers: Tue Jul 30 12:01:29 2019 - [info] 172.18.0.181(172.18.0.181:3309) Tue Jul 30 12:01:29 2019 - [info] 172.18.0.182(172.18.0.182:3309) Tue Jul 30 12:01:29 2019 - [info] 172.18.0.183(172.18.0.183:3309) Tue Jul 30 12:01:29 2019 - [info] Alive Slaves: Tue Jul 30 12:01:29 2019 - [info] 172.18.0.182(172.18.0.182:3309) Version=5.7.24-log (oldest major version between slaves) log-bin:enabled Tue Jul 30 12:01:29 2019 - [info] GTID ON Tue Jul 30 12:01:29 2019 - [info] Replicating from 172.18.0.181(172.18.0.181:3309) Tue Jul 30 12:01:29 2019 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jul 30 12:01:29 2019 - [info] 172.18.0.183(172.18.0.183:3309) Version=5.7.24-log (oldest major version between slaves) log-bin:enabled Tue Jul 30 12:01:29 2019 - [info] GTID ON Tue Jul 30 12:01:29 2019 - [info] Replicating from 172.18.0.181(172.18.0.181:3309) Tue Jul 30 12:01:29 2019 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jul 30 12:01:29 2019 - [info] Current Alive Master: 172.18.0.181(172.18.0.181:3309) Tue Jul 30 12:01:29 2019 - [info] Checking slave configurations.. Tue Jul 30 12:01:29 2019 - [info] Checking replication filtering settings.. Tue Jul 30 12:01:29 2019 - [info] binlog_do_db= , binlog_ignore_db= Tue Jul 30 12:01:29 2019 - [info] Replication filtering check ok. Tue Jul 30 12:01:29 2019 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Tue Jul 30 12:01:29 2019 - [info] Checking SSH publickey authentication settings on the current master.. Tue Jul 30 12:01:29 2019 - [info] HealthCheck: SSH to 172.18.0.181 is reachable. Tue Jul 30 12:01:29 2019 - [info] 172.18.0.181(172.18.0.181:3309) (current master) +--172.18.0.182(172.18.0.182:3309) +--172.18.0.183(172.18.0.183:3309)

Tue Jul 30 12:01:29 2019 - [info] Checking replication health on 172.18.0.182.. Tue Jul 30 12:01:29 2019 - [info] ok. Tue Jul 30 12:01:29 2019 - [info] Checking replication health on 172.18.0.183.. Tue Jul 30 12:01:29 2019 - [info] ok. Tue Jul 30 12:01:29 2019 - [info] Checking master_ip_failover_script status: Tue Jul 30 12:01:29 2019 - [info] /usr/local/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.18.0.181 --orig_master_ip=172.18.0.181 --orig_master_port=3309 Tue Jul 30 12:01:29 2019 - [info] OK. Tue Jul 30 12:01:29 2019 - [warning] shutdown_script is not defined. Tue Jul 30 12:01:29 2019 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

3.conf信息 [root@trsen184 masterha]# vi masterha_mha1.cnf [server default]

log_level=debug

mysql user

user=trsen password=xxx

ssh user

ssh_user=root ssh_port=22

replication user

repl_user=repl repl_password=xxx

monitor

ping_interval=3

shutdown_script=""

switch scripts

master_ip_failover_script= /usr/local/masterha/master_ip_failover master_ip_online_change_script= /usr/local/masterha/master_ip_online_change

mha manager directory

manager_workdir = /usr/local/masterha/mha1 manager_log = /usr/local/masterha/mha1/mha1.log remote_workdir = /usr/local/masterha/mha1

[server1] hostname=172.18.0.181 master_binlog_dir = /data/mysql/mha/logs candidate_master = 1 check_repl_delay = 0

[server2] hostname=172.18.0.182 master_binlog_dir=/data/mysql/mha/logs candidate_master=1 check_repl_delay=0

[server3] hostname=172.18.0.183 master_binlog_dir=/data/mysql/mha/logs candidate_master=1 check_repl_delay=0

4.hang在ssh 图片

5.ssh manual is ok ~ ~ [root@trsen184 ~]# ps -ef |grep ssh root 10 1 0 11:31 ? 00:00:00 /usr/sbin/sshd -D -e -u 0 root 54 10 0 11:31 ? 00:00:00 sshd: root@pts/0 root 18577 18576 0 12:03 pts/0 00:00:00 ssh -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5 -p 22 root@172.18.0.181 exit 0 root 20208 10 0 12:06 ? 00:00:00 sshd: root@pts/1 root 21498 20221 0 12:08 pts/1 00:00:00 grep --color=auto ssh

[root@trsen184 ~]# ssh -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5 -p 22 root@172.18.0.181 Last login: Tue Jul 30 11:48:16 2019 from 172.18.0.182 [root@trsen181 ~]# exit logout Connection to 172.18.0.181 closed. [root@trsen184 ~]#

hydeperion commented 4 years ago

@trsenzhang -san, Did you do tail -f in the same terminal? Does the same thing happen when you do tail -f in a different terminal?

zishiguo commented 2 years ago

@trsenzhang Are the question solved ?

zishiguo commented 2 years ago

add nohup in the head, such as nohup masterha_manager --conf=/etc/app1.cnf > /var/log/mha/mha.log 2>&1 &