xcat2 / xcat-core

Code repo for xCAT core packages
Eclipse Public License 1.0
359 stars 171 forks source link

Unable to Deploy cloned image of Centos 7.8 #6769

Open alexnovik opened 4 years ago

alexnovik commented 4 years ago

I did everything stated in this doc. https://xcat-docs.readthedocs.io/en/stable/advanced/sysclone/sysclone.html

Systemclone seems successfully captured from Golden Node running command [root@servicenew ~]# imgcapture cn34 -t sysclone -o compute_image

[root@servicenew /]# ls /install/sysclone/images/compute_image/ .autorelabel data/ home/ lib64/ opt/ root/ srv/ usr/ bin/ dev/ install/ media/ proc/ run/ sys/ var/ boot/ etc/ lib/ mnt/ .readahead sbin/ tmp/ xcatpost/

[root@servicenew /]# lsdef -t osimage compute_image Object name: compute_image imagetype=linux osarch=x86_64 osdistroname=centos7.8-x86_64 osname=Linux osvers=centos7.8 otherpkgdir=/install/post/otherpkgs/centos7.8/x86_64,/install/post/otherpkgs/centos7.8/x86_64/ otherpkglist=/opt/xcat/share/xcat/install/rh/sysclone.rhels7.x86_64.otherpkgs.pkglist pkgdir=/install/centos7.8/x86_64 profile=compute provmethod=sysclone rootimgdir=/install/sysclone/images/compute_image

Now I am trying to deploy compute_image to cn35

[root@servicenew /]# nodeset cn35 osimage=compute_image cn35: sysclone centos7.8-x86_64 [root@servicenew /]# rsetboot cn35 net cn35: Network [root@servicenew /]# rpower cn35 boot cn35: reset

And it stuck at this moment, not going further.

[root@servicenew /]# xcatprobe osdeploy -n cn35 -V The install NIC in current server is eno1 [INFO] All nodes to be deployed are valid [ OK ]

Start capturing every message during OS provision process....

[cn35] 02:35:04 Via TFTP download xcat/xnba.kpxe [cn35] 02:35:04 Via TFTP download xcat/xnba.kpxe [cn35] 02:35:05 Via HTTP get /tftpboot/xcat/xnba/nodes/cn35 [cn35] 02:35:05 Via HTTP get /tftpboot/xcat/genesis.kernel.x86_64 [cn35] 02:35:05 Via HTTP get /tftpboot/xcat/genesis.fs.x86_64.gz

The remote console on this server went through the network boot process and stuck with blinking cursor and fast disappearing text in the bottom [screen is terminating]

cxhong commented 4 years ago

We can recreate this on our test system. will look into what happened.

alexnovik commented 4 years ago

@cxhong any news?

cxhong commented 4 years ago

I can recreated it, but didn't get chance to debug it yet. Have you use sysclone on rhels7.6 or other OS?

alexnovik commented 4 years ago

Only Centos 7.8

On Mon, Jul 20, 2020 at 4:54 PM cxhong notifications@github.com wrote:

I can recreated it, but didn't get chance to debug it yet. Have you use sysclone on rhels7.6 or other OS?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xcat2/xcat-core/issues/6769#issuecomment-661054405, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBSJVBB2I52XWVCIBYAOVTR4REBJANCNFSM4O3FL7AQ .

-- Mobile: +375-296-223277 Regards, Alexsey N. Skype: alexnovik

cxhong commented 4 years ago

@alexnovik , can u check which genesis rpm installed on your MN? rpm -qa | grep genesis what did u see from the log, grep doxcat /var/log/xcat/cluster.log,

alexnovik commented 4 years ago

@cxhong [root@servicenew ~]# rpm -qa | grep genesis xCAT-genesis-scripts-x86_64-2.16-snap202006161607.noarch xCAT-genesis-scripts-ppc64-2.16-snap202006161607.noarch xCAT-genesis-base-ppc64-2.14.5-snap201811160710.noarch xCAT-genesis-base-x86_64-2.14.5-snap201811190037.noarch

[root@servicenew ~]# grep doxcat /var/log/xcat/cluster.log [root@servicenew ~]#

cxhong commented 4 years ago

maybe turn debug on chdef -t site xcatdebugmode=2 then rinstall the node again. also can u check ls -ltr /tftpboot/xcat/genesis* ? if you didn't see doxcat message in the xcat log, maybe check the logs for the node

# grep cn35 /var/log/xcat/cluster.log
alexnovik commented 4 years ago

ok, I am trying now with another node cn33

[root@servicenew genesis]# chdef -t site xcatdebugmode=2 1 object definitions have been created or modified.

[root@servicenew genesis]# ls -ltr /tftpboot/xcat/genesis* -rwxr-xr-x 1 root root 25417368 Nov 16 2018 /tftpboot/xcat/genesis.kernel.ppc64 -rwxr-xr-x 1 root root 6398144 Nov 19 2018 /tftpboot/xcat/genesis.kernel.x86_64 -rw-r--r-- 1 root root 125714718 Jul 8 02:02 /tftpboot/xcat/genesis.fs.ppc64.gz -rw-r--r-- 1 root root 126749719 Jul 8 02:02 /tftpboot/xcat/genesis.fs.x86_64.gz

[root@servicenew genesis]# grep doxcat /var/log/xcat/cluster.log Jul 27 12:34:11 cn33 xcat.genesis.doxcat: INFO Getting initial certificate --> 10.128.0.50:3001 Jul 27 12:34:11 cn33 xcat.genesis.doxcat: INFO Running getdestiny --> 10.128.0.50:3001 Jul 27 12:34:11 cn33 xcat.genesis.doxcat: INFO Received destiny=sysclone centos7.8-x86_64 Jul 27 12:34:11 cn33 xcat.genesis.doxcat: INFO The destiny=sysclone, destiny parameters=sysclone centos7.8-x86_64 Jul 27 12:34:11 cn33 xcat.genesis.doxcat: INFO Running dosysclone... Jul 28 11:41:20 cn33 xcat.genesis.doxcat: INFO Getting initial certificate --> 10.128.0.50:3001 Jul 28 11:41:20 cn33 xcat.genesis.doxcat: INFO Running getdestiny --> 10.128.0.50:3001 Jul 28 11:41:20 cn33 xcat.genesis.doxcat: INFO Received destiny=sysclone centos7.8-x86_64 Jul 28 11:41:20 cn33 xcat.genesis.doxcat: INFO The destiny=sysclone, destiny parameters=sysclone centos7.8-x86_64 Jul 28 11:41:20 cn33 xcat.genesis.doxcat: INFO Running dosysclone...

[root@servicenew genesis]# grep cn33 /var/log/xcat/cluster.log

Jul 27 11:40:08 servicenew xcat[250589]: DEBUG xcatd: dispatch request 'mkdef -t node cn33 groups=x86_64 mgt=ipmi cons=ipmi ip=10.128.1.33 netboot=xnba bmc=10.128.4.84 bmcusername=xxx bmcpassword=xxx installnic=mac primarynic=mac mac=48:df:37:4a:88:64' to plugin 'DBobjectdefs' Jul 27 11:41:22 servicenew xcat[250660]: DEBUG xcatd: open new process : xcatd SSL: nodeset to cn33 for root@localhost.localdomain Jul 27 11:41:22 servicenew xcat[250660]: INFO xCAT: Allowing nodeset to cn33 osimage=compute_image for root from localhost.localdomain Jul 27 11:41:22 servicenew xcat[250661]: DEBUG xcatd: dispatch request 'nodeset cn33 osimage=compute_image' to plugin 'xnba' Jul 27 11:41:22 servicenew xcat[250661]: DEBUG xnba: [total=0] nodes are cn33 Jul 27 11:41:23 servicenew xcat[250661]: DEBUG dhcp: nodes are cn33 Jul 27 11:41:38 servicenew xcat[250761]: DEBUG xcatd: open new process : xcatd SSL: rsetboot to cn33 for root@localhost.localdomain Jul 27 11:41:38 servicenew xcat[250761]: INFO xCAT: Allowing rsetboot to cn33 net for root from localhost.localdomain Jul 27 11:41:38 servicenew xcat[250762]: DEBUG xcatd: dispatch request 'rsetboot cn33 net' to plugin 'ipmi' Jul 27 11:43:38 servicenew xcat[250864]: INFO xCAT: Allowing chdef -t node cn33 mgt=hpilo for root from localhost.localdomain Jul 27 11:43:38 servicenew xcat[250865]: DEBUG xcatd: dispatch request 'chdef -t node cn33 mgt=hpilo' to plugin 'DBobjectdefs' Jul 27 11:43:42 servicenew xcat[250873]: DEBUG xcatd: open new process : xcatd SSL: rsetboot to cn33 for root@localhost.localdomain Jul 27 11:43:42 servicenew xcat[250873]: INFO xCAT: Allowing rsetboot to cn33 net for root from localhost.localdomain Jul 27 11:43:42 servicenew xcat[250874]: DEBUG xcatd: dispatch request 'rsetboot cn33 net' to plugin 'AAAusage' Jul 27 11:45:25 servicenew xcat[250962]: INFO xCAT: Allowing chdef -t node cn33 mgt=ipmi for root from localhost.localdomain Jul 27 11:45:25 servicenew xcat[250964]: DEBUG xcatd: dispatch request 'chdef -t node cn33 mgt=ipmi' to plugin 'DBobjectdefs' Jul 27 11:45:29 servicenew xcat[250972]: DEBUG xcatd: open new process : xcatd SSL: rsetboot to cn33 for root@localhost.localdomain Jul 27 11:45:29 servicenew xcat[250972]: INFO xCAT: Allowing rsetboot to cn33 net for root from localhost.localdomain Jul 27 11:45:29 servicenew xcat[250973]: DEBUG xcatd: dispatch request 'rsetboot cn33 net' to plugin 'ipmi' Jul 27 11:47:47 servicenew xcat[251109]: DEBUG xcatd: open new process : xcatd SSL: rsetboot to cn33 for root@localhost.localdomain Jul 27 11:47:48 servicenew xcat[251109]: INFO xCAT: Allowing rsetboot to cn33 net for root from localhost.localdomain Jul 27 11:47:48 servicenew xcat[251111]: DEBUG xcatd: dispatch request 'rsetboot cn33 net' to plugin 'ipmi' Jul 27 11:48:19 servicenew xcat[251141]: DEBUG xcatd: open new process : xcatd SSL: rpower to cn33 for root@localhost.localdomain Jul 27 11:48:19 servicenew xcat[251141]: INFO xCAT: Allowing rpower to cn33 boot for root from localhost.localdomain Jul 27 11:48:19 servicenew xcat[251142]: DEBUG xcatd: dispatch request 'rpower cn33 boot' to plugin 'ipmi' Jul 27 11:48:19 servicenew xcat[251142]: INFO xcat.updatestatus - cn33: changing status=powering-on Jul 27 11:48:42 servicenew xcat[251200]: INFO xCAT: Allowing lsdef cn33 -i ip,mac -c for root from localhost.localdomain Jul 27 11:48:42 servicenew xcat[251201]: DEBUG xcatd: dispatch request 'lsdef cn33 -i ip,mac -c' to plugin 'DBobjectdefs' Jul 27 11:48:45 servicenew xcat[251217]: DEBUG xcatd: open new process : xcatd SSL: nodels to cn33 for root@localhost.localdomain Jul 27 11:48:45 servicenew xcat[251217]: INFO xCAT: Allowing nodels to cn33 for root from localhost.localdomain Jul 27 11:48:45 servicenew xcat[251218]: DEBUG xcatd: dispatch request 'nodels cn33 ' to plugin 'tabutils' Jul 27 12:12:08 servicenew xcat[253146]: INFO xCAT: Allowing lsdef cn33 -i ip,mac -c for root from localhost.localdomain Jul 27 12:12:08 servicenew xcat[253147]: DEBUG xcatd: dispatch request 'lsdef cn33 -i ip,mac -c' to plugin 'DBobjectdefs' Jul 27 12:12:08 servicenew xcat[253161]: DEBUG xcatd: open new process : xcatd SSL: nodels to cn33 for root@localhost.localdomain Jul 27 12:12:08 servicenew xcat[253161]: INFO xCAT: Allowing nodels to cn33 for root from localhost.localdomain Jul 27 12:12:08 servicenew xcat[253163]: DEBUG xcatd: dispatch request 'nodels cn33 ' to plugin 'tabutils' Jul 27 12:34:11 cn33 xcat.genesis.doxcat: INFO Getting initial certificate --> 10.128.0.50:3001 Jul 27 12:34:11 servicenew xcat[3048]: DEBUG xcatd: connection from cn33 Jul 27 12:34:11 servicenew xcat[3048]: DEBUG xcatd: open new process : xcatd SSL: getcredentials for cn33 Jul 27 12:34:11 servicenew xcat[3048]: INFO xCAT: Allowing getcredentials x509cert from cn33 Jul 27 12:34:11 servicenew xcat[3050]: INFO credentials: sending x509cert to cn33 Jul 27 12:34:11 servicenew xcat[3048]: DEBUG xcatd: close connection with cn33 Jul 27 12:34:11 cn33 xcat.genesis.doxcat: INFO Running getdestiny --> 10.128.0.50:3001 Jul 27 12:34:11 servicenew xcat[3053]: DEBUG xcatd: connection from cn33@cn33 Jul 27 12:34:11 servicenew xcat[3053]: DEBUG xcatd: open new process : xcatd SSL: getdestiny for cn33@cn33 Jul 27 12:34:11 servicenew xcat[3054]: INFO xcat.updatestatus - cn33: changing status=booting Jul 27 12:34:11 servicenew xcat[3053]: DEBUG xcatd: close connection with cn33@cn33 Jul 27 12:34:11 cn33 xcat.genesis.doxcat: INFO Received destiny=sysclone centos7.8-x86_64 Jul 27 12:34:11 cn33 xcat.genesis.doxcat: INFO The destiny=sysclone, destiny parameters=sysclone centos7.8-x86_64 Jul 27 12:34:11 cn33 xcat.genesis.doxcat: INFO Running dosysclone... Jul 28 10:57:56 servicenew xcat[78492]: INFO xCAT: Allowing lsdef cn33 -i ip,mac -c for root from localhost.localdomain Jul 28 10:57:56 servicenew xcat[78493]: DEBUG xcatd: dispatch request 'lsdef cn33 -i ip,mac -c' to plugin 'DBobjectdefs' Jul 28 10:57:57 servicenew xcat[78507]: DEBUG xcatd: open new process : xcatd SSL: nodels to cn33 for root@localhost.localdomain Jul 28 10:57:57 servicenew xcat[78507]: INFO xCAT: Allowing nodels to cn33 for root from localhost.localdomain Jul 28 10:57:57 servicenew xcat[78508]: DEBUG xcatd: dispatch request 'nodels cn33 ' to plugin 'tabutils' Jul 28 11:07:00 servicenew xcat[78998]: INFO xCAT: Allowing lsdef cn33 for root from localhost.localdomain Jul 28 11:07:00 servicenew xcat[78999]: DEBUG xcatd: dispatch request 'lsdef cn33' to plugin 'DBobjectdefs' Jul 28 11:09:49 servicenew xcat[79139]: INFO xCAT: Allowing lsdef cn33 for root from localhost.localdomain Jul 28 11:09:49 servicenew xcat[79140]: DEBUG xcatd: dispatch request 'lsdef cn33' to plugin 'DBobjectdefs' Jul 28 11:09:53 servicenew xcat[79149]: INFO xCAT: Allowing lsdef cn33 for root from localhost.localdomain Jul 28 11:09:53 servicenew xcat[79150]: DEBUG xcatd: dispatch request 'lsdef cn33' to plugin 'DBobjectdefs' Jul 28 11:11:33 servicenew xcat[79236]: INFO xCAT: Allowing lsdef cn33 for root from localhost.localdomain Jul 28 11:11:33 servicenew xcat[79237]: DEBUG xcatd: dispatch request 'lsdef cn33' to plugin 'DBobjectdefs' Jul 28 11:35:02 servicenew xcat[80376]: INFO xCAT: Allowing lsdef cn33 for root from localhost.localdomain Jul 28 11:35:02 servicenew xcat[80377]: DEBUG xcatd: dispatch request 'lsdef cn33' to plugin 'DBobjectdefs' Jul 28 11:38:20 servicenew xcat[80541]: DEBUG xcatd: open new process : xcatd SSL: rpower to cn33 for root@localhost.localdomain Jul 28 11:38:20 servicenew xcat[80541]: INFO xCAT: Allowing rpower to cn33 boot for root from localhost.localdomain Jul 28 11:38:20 servicenew xcat[80542]: DEBUG xcatd: dispatch request 'rpower cn33 boot' to plugin 'ipmi' Jul 28 11:38:20 servicenew xcat[80542]: INFO xcat.updatestatus - cn33: changing status=powering-on Jul 28 11:41:20 cn33 xcat.genesis.doxcat: INFO Getting initial certificate --> 10.128.0.50:3001 Jul 28 11:41:20 servicenew xcat[80690]: DEBUG xcatd: connection from cn33 Jul 28 11:41:20 servicenew xcat[80690]: DEBUG xcatd: open new process : xcatd SSL: getcredentials for cn33 Jul 28 11:41:20 servicenew xcat[80690]: INFO xCAT: Allowing getcredentials x509cert from cn33 Jul 28 11:41:20 servicenew xcat[80691]: INFO credentials: sending x509cert to cn33 Jul 28 11:41:20 servicenew xcat[80691]: INFO credentials: The time of replacing is at hand for cn33 Jul 28 11:41:20 servicenew xcat[80690]: DEBUG xcatd: close connection with cn33 Jul 28 11:41:20 cn33 xcat.genesis.doxcat: INFO Running getdestiny --> 10.128.0.50:3001 Jul 28 11:41:20 servicenew xcat[80696]: DEBUG xcatd: connection from cn33@cn33 Jul 28 11:41:20 servicenew xcat[80696]: DEBUG xcatd: open new process : xcatd SSL: getdestiny for cn33@cn33 Jul 28 11:41:20 servicenew xcat[80697]: INFO xcat.updatestatus - cn33: changing status=booting Jul 28 11:41:20 servicenew xcat[80696]: DEBUG xcatd: close connection with cn33@cn33 Jul 28 11:41:20 cn33 xcat.genesis.doxcat: INFO Received destiny=sysclone centos7.8-x86_64 Jul 28 11:41:20 cn33 xcat.genesis.doxcat: INFO The destiny=sysclone, destiny parameters=sysclone centos7.8-x86_64 Jul 28 11:41:20 cn33 xcat.genesis.doxcat: INFO Running dosysclone... Jul 28 11:48:57 servicenew xcat[81080]: INFO xCAT: Allowing lsdef cn33 for root from localhost.localdomain Jul 28 11:48:57 servicenew xcat[81081]: DEBUG xcatd: dispatch request 'lsdef cn33' to plugin 'DBobjectdefs'

Now it stuck in other message. Rsync seems to unable to access ::scripts directory, NFS exported /install dir works fine. image

alexnovik commented 4 years ago

I restarted rcyncd service at management node and on compute node in unfinished install terminal tried this command

[xCAT Genesis running on cn33 /]# rsync -a servicenew::scripts/ /scripts/
@ERROR: Unknown module 'scripts'
rsync error: error starting client-server protocol (code 5) at main.c(1648) [Receiver=3.1.2]

also seems like xcat not creating any modules for rsyncd and trying to call unexisted module scripts

[root@servicenew xcat-dep]# cat /etc/rsyncd.conf
# /etc/rsyncd: configuration file for rsync daemon mode

# See rsyncd.conf man page for more options.

# configuration example:

# uid = nobody
# gid = nobody
# use chroot = yes
# max connections = 4
# pid file = /var/run/rsyncd.pid
# exclude = lost+found/
# transfer logging = yes
# timeout = 900
# ignore nonreadable = yes
# dont compress   = *.gz *.tgz *.zip *.z *.Z *.rpm *.deb *.bz2

# [ftp]
#        path = /home/ftp
#        comment = ftp export area
alexnovik commented 4 years ago

Ok, I fixed the issue with rsync adding this to /etc/rsyncd.conf (probably need some fixes in code

[scripts]
        path = /install/sysclone/scripts/
        comment = xcat install scripts

Now installation stops during

write_variables
cat: /etc/issue: No such file or directory
[88.888978] sda sda1 sda2
sh: no job control in this shell
sh-4.2#:

Now I don't know what to do. /etc/issue not exist in genimage

Waiting for your advice @cxhong

cxhong commented 4 years ago

can u check if /etc/issue on the rhels7.8 system? that's what look like on the rhels7.6. maybe you can try to edit

# cat /etc/issue
\S
Kernel \r on an \m

[root@c910f03c09k09 ~]# rpm -qf /etc/issue
redhat-release-server-7.6-4.el7.ppc64le
alexnovik commented 4 years ago

@cxhong yeah, I know and already fixed it in cloned image, but this string requesting /etc/issue on boot image (genimage), not cloned image. And this file not exist in genimage.

cxhong commented 3 years ago

@alexnovik sorry for the late responds... I had chance to look the issue again, the really error message is not about /etc/issue, it used here to log the error messages. some error happened few lines before that...

write_variables
cat: /etc/issue: No such file or directory
[88.888978] sda sda1 sda2
sh: no job control in this shell
sh-4.2#:

when i ran the test, the number of disks has to match in the clone image. if clone image had 4 disks, and node only has one disk, it will cause the error and end of rcons will show above /etc/issue messages. you can also check the sysclone scripts /install/sysclone/scripts/xxxxxxx.master on the MN