xcat2 / xcat-core

Code repo for xCAT core packages
Eclipse Public License 1.0
371 stars 172 forks source link

download OS failed!!! #2867

Closed TRF123 closed 7 years ago

TRF123 commented 7 years ago

I am trying to deploy centos7 OS in my compute node using the disk-full installation it seems that the compute node is communicating with the master node and it starts getting the packages and it suddenly hangs and gives the following error from the compute node from xcatprobe :

[root@master tmp]#  xcatprobe osdeploy -n testtest
The install NIC in current server is eth1                                 [INFO]
All nodes to be deployed are valid                                        [ OK ]
-------------------------------------------------------------
Start capturing every message during OS provision process....
-------------------------------------------------------------

[testtest] 21:38:04 Use command rpower to reboot node testtest
[testtest] 21:38:16 Node status is changed to powering-on
[testtest] 21:38:19 Receive DHCPDISCOVER via eth1
[testtest] 21:38:19 Send DHCPOFFER on 10.0.0.70 back to 42:01:0a:00:00:...
[testtest] 21:38:19 DHCPREQUEST for 10.0.0.70 (10.0.0.10) from 42:01:0a...
[testtest] 21:38:19 Send DHCPACK on 10.0.0.70 back to 42:01:0a:00:00:46...
[testtest] 21:38:31 Via TFTP download xcat/xnba.kpxe
[testtest] 21:38:32 Receive DHCPDISCOVER via eth1
[testtest] 21:38:32 Via HTTP get /tftpboot/xcat/xnba/nodes/testtest
 kernel:BUG: soft lockup - CPU#0 stuck for 67s! [osdeploy:15868]
^C[testtest] 21:38:32 Send DHCPOFFER on 10.0.0.70 back to 42:01:0a:00:00:...
Get INT or TERM signal from STDIN
======================  Summary  =====================
There is 1 node provision failures
testtest : stop at stage 'download_kernel'  

I caanot actually identify the reason of this issue, can I have help please !!!!!!!!!!

whowutwut commented 7 years ago

@TRF123 Interesting, I formatted your message above for better readability using the markdown tags.

Can you provide the output of the following commands:

Also can you run.. the xcatprobe xcatmn -i eth1 and lets make sure your management node (MN) is configured correctly...

TRF123 commented 7 years ago

thank you Mr. Victor for your reply and help I appreciate it . this is the output requested:

[root@master tftpboot]# tabdump networks

netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,staticrange,staticrangeincrement,nodehostname,ddnsdomain,vlanid,domain,mtu,comments,disable

"10_0_0_0-255_255_255_0","10.0.0.0","255.255.255.0","eth1","",,"10.0.0.10",,,,,,,,,,,,, "192_168_121_0-255_255_255_0","192.168.121.0","255.255.255.0","eth0","192.168.121.1",,"192.168.121.90",,,,,,,,,,,,,

[root@master tftpboot]# lsdef testtest Object name: testtest arch=x86_64 currchain=boot currstate=install centos7.3-x86_64-compute groups=vm,all ip=10.0.0.70 mac=42:01:0a:00:00:46 mgt=kvm netboot=xnba os=centos7.3 postbootscripts=otherpkgs postscripts=syslog,remoteshell,syncfiles profile=compute provmethod=centos7.3-x86_64-install-compute serialport=0 serialspeed=115200 status=powering-off statustime=04-14-2017 12:39:28 vmcpus=4 vmhost=10.0.0.1 vmmemory=1024 vmnicnicmodel=virtio vmnics=virb1 vmstorage=dir:///var/lib/libvirt/images

[root@master tftpboot]# lsdef -t osimage -o centos7.3-x86_64-install-compute Object name: centos7.3-x86_64-install-compute imagetype=linux osarch=x86_64 osdistroname=centos7.3-x86_64 osname=Linux osvers=centos7.3 otherpkgdir=/install/post/otherpkgs/centos7.3/x86_64 pkgdir=/install/centos7.3/x86_64 pkglist=/opt/xcat/share/xcat/install/centos/compute.centos7.pkglist profile=compute provmethod=install template=/opt/xcat/share/xcat/install/centos/compute.centos7.tmpl

whowutwut commented 7 years ago

Can you run the following: xcatprobe xcatmn -i eth1

whowutwut commented 7 years ago

Also, can you do the following:

1) find /tftpboot -name "testtest" -print and show me the contents of that file. 2) The VM seems to be booted.... can you go to the host.... ssh 10.0.0.1 and do... virsh list --all and you should see this testtest vm there.. then also brctl show to validate that you do have a bridge, and it has a vnet for the VM. you can see the VM attributes using virsh dumpxml testtest and make sure that the vnet is matching.. but since it looks like it's connecting xCAT, this should be OK....
2) chdef testtest cons=kvm then run makeconservercf testtest and then you can open rcons testtest and see ... ctrl+e c . is the keystroke to get out of rcons

TRF123 commented 7 years ago

[root@master tftpboot]# xcatprobe xcatmn -i eht1 [mn]: Checking all xCAT deamons are running... [ OK ] [mn]: Checking xcatd can receive command request... [ OK ] [mn]: Checking 'site' table is configured... [ OK ] [mn]: Checking provision network is configured... [FAIL] [mn]: There isn't NIC 'eht1' in current server [mn]: IP of eht1 doesn't belong to any network defined in 'networks' table =================================== SUMMARY ==================================== [MN]: Checking on MN... [FAIL] Checking provision network is configured... [FAIL] There isn't NIC 'eht1' in current server IP of eht1 doesn't belong to any network defined in 'networks' table

TRF123 commented 7 years ago

this is the content of the file /tftpboot/xcat/xnba/nodes/testtest: and I noriced that the IP is dhcp, should I change it ? to the IP address of my compute node ? since I am using static IPs

!gpxe

install centos7.3-x86_64-compute

imgfetch -n kernel http://${next-server}/tftpboot/xcat/osimage/centos7.3-x86_64-install-compute/vmlinuz imgload kernel imgargs kernel quiet inst.repo=http://${next-server}:80/install/centos7.3/x86_64 inst.ks=http://${next-server}:80/install/autoinst/testtest ip=dhcp inst.cmdline console=tty0 console=ttyS0,115200 BOOTIF$ imgfetch http://${next-server}/tftpboot/xcat/osimage/centos7.3-x86_64-install-compute/initrd.img imgexec kernel

whowutwut commented 7 years ago

In the above, you spelled eth1 incorrectly... eht1...

TRF123 commented 7 years ago

sorry for that :

[root@master tftpboot]# xcatprobe xcatmn -i eth1 [mn]: Checking all xCAT deamons are running... [ OK ] [mn]: Checking xcatd can receive command request... [ OK ] [mn]: Checking 'site' table is configured... [ OK ] [mn]: Checking provision network is configured... [ OK ] [mn]: Checking 'passwd' table is configured... [ OK ] [mn]: Checking important directories(installdir,tftpdir) are configured... [ OK ] [mn]: Checking SELinux is disabled... [ OK ] [mn]: Checking HTTP service is configured... [ OK ] [mn]: Checking TFTP service is configured... [ OK ] [mn]: Checking DNS service is configured... [ OK ] [mn]: Checking DHCP service is configured... [ OK ] [mn]: Checking NTP service is configured... [ OK ] [mn]: Checking firewall is disabled... [ OK ] [mn]: Checking minimum disk space for xCAT ['/var' needs 1GB;'/install' needs 10GB;'/tmp' needs 1GB]... [ OK ] [mn]: Checking xCAT management node IP: <10.0.0.10> is configured to static... [WARN] [mn]: The value '10.0.0.10' of 'master' in 'site' table isn't a static ip [mn]: Checking dhcpd.leases file is less than 100M... [ OK ] =================================== SUMMARY ==================================== [MN]: Checking on MN... [ OK ] Checking xCAT management node IP: <10.0.0.10> is configured to static... [WARN] The value '10.0.0.10' of 'master' in 'site' table isn't a static ip

whowutwut commented 7 years ago

No, you don't need to change ip=dhcp, xCAT serves the compute node the IP you set in the node definition. IF you want to set the IP to static, you can use xcat postscript hardeths to convert the dhcp IP address to static.

Is the kernel there? ls -ltr /tftpboot/xcat/osimage/centos7.3-x86_64-install-compute/vmlinuz

and all the files are there on the MN machine? ls -ltr /install/centos7.3/x86_64


#!gpxe
#install centos7.3-x86_64-compute
imgfetch -n kernel http://${next-server}/tftpboot/xcat/osimage/centos7.3-x86_64-install-compute/vmlinuz
imgload kernel
imgargs kernel quiet inst.repo=http://${next-server}:80/install/centos7.3/x86_64 inst.ks=http://${next-server}:80/install/autoinst/testtest ip=dhcp inst.cmdline console=tty0 console=ttyS0,115200 BOOTIF$
imgfetch http://${next-server}/tftpboot/xcat/osimage/centos7.3-x86_64-install-compute/initrd.img
imgexec kernel```
TRF123 commented 7 years ago
[root@master tftpboot]# ls -ltr /tftpboot/xcat/osimage/centos7.3-x86_64-install-compute/vmlinuz
-rw-r--r-- 1 root root 5392080 2017-04-14 12:35 /tftpboot/xcat/osimage/centos7.3-x86_64-install-compute/vmlinuz
[root@master tftpboot]#  ls -ltr /install/centos7.3/x86_64
total 312
-rw-r--r-- 1 root root   1690 2015-12-09 22:35 RPM-GPG-KEY-CentOS-Testing-7
-rw-r--r-- 1 root root   1690 2015-12-09 22:35 RPM-GPG-KEY-CentOS-7
-rw-r--r-- 1 root root  18009 2015-12-09 22:35 GPL
-rw-r--r-- 1 root root    215 2015-12-09 22:35 EULA
-rw-r--r-- 1 root root     14 2016-12-05 13:02 CentOS_BuildTag
-r--r--r-- 1 root root   2883 2016-12-05 13:55 TRANS.TBL
drwxr-xr-x 3 root root   4096 2017-04-13 21:33 EFI
drwxr-xr-x 2 root root   4096 2017-04-13 21:33 LiveOS
drwxrwxr-x 2 root root 253952 2017-04-13 21:35 Packages
drwxr-xr-x 3 root root   4096 2017-04-13 21:35 images
drwxr-xr-x 2 root root   4096 2017-04-13 21:35 isolinux
drwxrwxr-x 2 root root   4096 2017-04-13 21:35 repodata
-rw-r--r-- 1 root root    154 2017-04-13 21:35 local-repository.tmpl