xcat2 / xcat-extensions

Repos to store scripts for special user cases
4 stars 8 forks source link

The high availability xCAT management node lost its virtual IP address after reboot #21

Closed neo954 closed 6 years ago

neo954 commented 6 years ago

This bug is against xcatha.py commit 7620439004af8683a6b69bfcfc6125e95310e78d.

After activate one of the high availability xCAT management node, it seems xcatha.py does not write any permanent configuration file under directory /etc/sysconfig/network-scripts. Thus, after an operating system reboot, the xCAT management node lost its settings.

# ./xcatha.py -a -p /media/u/gongjie/ha-test -i eth0:99 -v 10.3.1.99 -m 255.0.0.0  -t sqlite
2018-06-12 01:07:33,183 - INFO - Activating this node as xCAT primary MN
############################################################################################
2018-06-12 01:07:33,183 - INFO - Activate stage
============================================================================================
2018-06-12 01:07:33,213 - INFO - Check virtual ip stage
2018-06-12 01:07:33,214 - INFO - ping -c 1 -w 10 10.3.1.99
PING 10.3.1.99 (10.3.1.99) 56(84) bytes of data.
From 10.3.1.7 icmp_seq=1 Destination Host Unreachable
From 10.3.1.7 icmp_seq=2 Destination Host Unreachable
From 10.3.1.7 icmp_seq=3 Destination Host Unreachable

--- 10.3.1.99 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2004ms
pipe 3
2018-06-12 01:07:36,218 - INFO - virtual ip can be used.
============================================================================================
2018-06-12 01:07:36,218 - INFO - Configure virtual ip as alias ip stage
2018-06-12 01:07:36,220 - INFO - ifconfig eth0:99 10.3.1.99  netmask 255.0.0.0 [Passed]
============================================================================================
2018-06-12 01:07:36,236 - INFO - Configure hostname stage
2018-06-12 01:07:36,237 - INFO - hostname c910f03c01p99 [Passed]
2018-06-12 01:07:36,238 - INFO - Check if xCAT data is in shared data directory
2018-06-12 01:07:36,240 - INFO - There is xCAT data /media/u/gongjie/ha-test/install in shared data /media/u/gongjie/ha-test
============================================================================================
2018-06-12 01:07:36,240 - INFO - Configure shared data directory stage
2018-06-12 01:07:36,264 - INFO - systemctl stop goconserver [Passed]
2018-06-12 01:07:36,276 - INFO - systemctl stop conserver [Passed]
2018-06-12 01:07:36,287 - INFO - systemctl stop ntpd [Passed]
2018-06-12 01:07:36,299 - INFO - systemctl stop dhcpd [Passed]
2018-06-12 01:07:36,312 - INFO - systemctl stop named [Passed]
2018-06-12 01:07:36,324 - INFO - systemctl stop xcatd [Passed]
Failed to stop mariadb.service: Unit mariadb.service not loaded.
2018-06-12 01:07:39,338 - INFO - Retry 1 ... ...systemctl stop mariadb
Failed to stop mariadb.service: Unit mariadb.service not loaded.
2018-06-12 01:07:42,353 - INFO - Retry 2 ... ...systemctl stop mariadb
Failed to stop mariadb.service: Unit mariadb.service not loaded.
2018-06-12 01:07:42,364 - ERROR - systemctl stop mariadb [Failed]
Failed to stop postgresql.service: Unit postgresql.service not loaded.
2018-06-12 01:07:45,378 - INFO - Retry 1 ... ...systemctl stop postgresql
Failed to stop postgresql.service: Unit postgresql.service not loaded.
2018-06-12 01:07:48,392 - INFO - Retry 2 ... ...systemctl stop postgresql
Failed to stop postgresql.service: Unit postgresql.service not loaded.
2018-06-12 01:07:48,404 - ERROR - systemctl stop postgresql [Failed]
2018-06-12 01:07:48,405 - INFO - Creating symlink .../install
2018-06-12 01:07:48,405 - INFO - Creating symlink .../etc/xcat
2018-06-12 01:07:48,405 - INFO - Creating symlink .../root/.xcat
2018-06-12 01:07:48,405 - INFO - Creating symlink .../var/lib/pgsql
2018-06-12 01:07:48,405 - INFO - Creating symlink .../var/lib/mysql
2018-06-12 01:07:48,405 - INFO - Creating symlink .../tftpboot
2018-06-12 01:07:48,411 - INFO - cat /tmp/ha_mn >> /etc/xcat/ha_mn [Passed]
============================================================================================
2018-06-12 01:07:48,411 - INFO - Start all services stage
2018-06-12 01:07:50,696 - INFO - systemctl start xcatd [Passed]
    domain=pok.stglabs.ibm.com
2018-06-12 01:07:51,070 - INFO - lsdef -t site -i domain|grep domain [Passed]
2018-06-12 01:07:51,071 - WARNING - Long hostname is not in "/etc/hosts". "named" service will not be started
Renamed existing dhcp configuration file to  /etc/dhcp/dhcpd.conf.xcatbak

Warning: No dynamic range specified for 10.0.0.0. If hardware discovery is being used, a dynamic range is required.
2018-06-12 01:07:51,595 - INFO - makedhcp -n [Passed]
2018-06-12 01:07:51,792 - INFO - makedhcp -a [Passed]
2018-06-12 01:07:51,838 - INFO - systemctl start ntpd [Passed]
2018-06-12 01:07:51,839 - INFO - This machine is set to primary management node successfully...
bybai commented 6 years ago

After reboot, the VIP is lost, user need to run "--activate" process to activate this MN, it is as the same with current design.

neo954 commented 6 years ago

I suggest, document this issue in xCAT document.

neo954 commented 6 years ago

Refer to xcat2/xcat2-task-management#163