xcat2 / xcat-extensions

Repos to store scripts for special user cases
4 stars 8 forks source link

Postgresql cannot be started during activate #29

Closed robin2008 closed 6 years ago

robin2008 commented 6 years ago
./xcatha.py -a -p /data -i eth0:0 -v 10.3.5.20
2018-06-13 22:49:06,585 - INFO - Activating this node as xCAT primary MN
2018-06-13 22:49:06,585 - INFO - ########## Activate stage ##########
2018-06-13 22:49:06,585 - INFO - ===> Check virtual ip stage <===
2018-06-13 22:49:06,586 - DEBUG - ping -c 1 -w 10 10.3.5.20
PING 10.3.5.20 (10.3.5.20) 56(84) bytes of data.
From 10.3.5.10 icmp_seq=1 Destination Host Unreachable

--- 10.3.5.20 ping statistics ---
3 packets transmitted, 0 received, +1 errors, 100% packet loss, time 2003ms
pipe 3
2018-06-13 22:49:09,591 - INFO - virtual ip can be used.
2018-06-13 22:49:09,592 - INFO - ===> Configure virtual ip as alias ip stage <===
2018-06-13 22:49:09,598 - DEBUG - ifconfig eth0:0 10.3.5.20  netmask 255.255.255.0 [Passed]
2018-06-13 22:49:09,606 - INFO - ===> Configure hostname stage <===
2018-06-13 22:49:09,612 - DEBUG - hostname c910f03c05k20 [Passed]
2018-06-13 22:49:09,613 - INFO - Check if xCAT data is in shared data directory
2018-06-13 22:49:09,613 - DEBUG - There is xCAT data /data/install in shared data /data
2018-06-13 22:49:09,613 - INFO - ===> Configure shared data directory stage <===
2018-06-13 22:49:09,644 - DEBUG - systemctl stop goconserver [Passed]
2018-06-13 22:49:09,660 - DEBUG - systemctl stop conserver [Passed]
2018-06-13 22:49:09,674 - DEBUG - systemctl stop ntpd [Passed]
2018-06-13 22:49:09,690 - DEBUG - systemctl stop dhcpd [Passed]
2018-06-13 22:49:09,706 - DEBUG - systemctl stop named [Passed]
2018-06-13 22:49:09,721 - DEBUG - systemctl stop xcatd [Passed]
Failed to stop mariadb.service: Unit mariadb.service not loaded.
2018-06-13 22:49:12,738 - DEBUG - Retry 1 ... ...systemctl stop mariadb
Failed to stop mariadb.service: Unit mariadb.service not loaded.
2018-06-13 22:49:15,764 - DEBUG - Retry 2 ... ...systemctl stop mariadb
Failed to stop mariadb.service: Unit mariadb.service not loaded.
2018-06-13 22:49:15,795 - ERROR - systemctl stop mariadb [Failed]
2018-06-13 22:49:15,817 - DEBUG - systemctl stop postgresql [Passed]
2018-06-13 22:49:15,818 - INFO - Creating symlink .../install
2018-06-13 22:49:15,819 - INFO - Creating symlink .../etc/xcat
2018-06-13 22:49:15,819 - INFO - Creating symlink .../root/.xcat
2018-06-13 22:49:15,819 - INFO - Creating symlink .../var/lib/pgsql
2018-06-13 22:49:15,820 - INFO - Creating symlink .../var/lib/mysql
2018-06-13 22:49:15,820 - INFO - Creating symlink .../tftpboot
2018-06-13 22:49:15,847 - INFO - ===> Start all services stage <===
Job for xcatd.service failed because a timeout was exceeded. See "systemctl status xcatd.service" and "journalctl -xe" for details.
2018-06-13 22:54:18,972 - DEBUG - Retry 1 ... ...systemctl start xcatd
2018-06-13 22:54:29,325 - DEBUG - systemctl start xcatd [Passed]
2018-06-13 22:54:30,164 - ERROR - lsdef -t site -i domain|grep domain [Failed]
2018-06-13 22:54:30,165 - WARNING - "domain" entry is not in "site" table. "named" service will not be started
2018-06-13 22:54:30,165 - WARNING - "domain" entry is not in "site" table. "dhcpd" service will not be started
2018-06-13 22:54:30,246 - DEBUG - systemctl start ntpd [Passed]
2018-06-13 22:54:30,247 - INFO - This machine is set to primary management node successfully...

Possible cause: start posgresql soon after stop posgresql, it will cause the start failed, and it seems the script does not check and retry.

systemctl status postgresql
* postgresql.service - PostgreSQL database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2018-06-13 22:58:53 EDT; 56s ago
  Process: 29204 ExecStop=/usr/bin/pg_ctl stop -D ${PGDATA} -s -m fast (code=exited, status=1/FAILURE)
  Process: 28714 ExecStart=/usr/bin/pg_ctl start -D ${PGDATA} -s -o -p ${PGPORT} -w -t 300 (code=exited, status=0/SUCCESS)
  Process: 28708 ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)

Jun 13 22:54:24 c910f03c05k20 systemd[1]: Starting PostgreSQL database server...
Jun 13 22:54:25 c910f03c05k20 systemd[1]: Started PostgreSQL database server.
Jun 13 22:58:53 c910f03c05k20 systemd[1]: Stopping PostgreSQL database server...
Jun 13 22:58:53 c910f03c05k20 pg_ctl[29204]: pg_ctl: PID file "/var/lib/pgsql/data/postmaster.pid" does not exist
Jun 13 22:58:53 c910f03c05k20 systemd[1]: postgresql.service: control process exited, code=exited status=1
Jun 13 22:58:53 c910f03c05k20 systemd[1]: Stopped PostgreSQL database server.
Jun 13 22:58:53 c910f03c05k20 systemd[1]: Unit postgresql.service entered failed state.
Jun 13 22:58:53 c910f03c05k20 systemd[1]: postgresql.service failed.

Try again, and manual start postgresql on the other terminal could workaround.