xcat2 / xcat-core

Code repo for xCAT core packages
Eclipse Public License 1.0
366 stars 172 forks source link

nodeset for diskful node throws an error: "Error: Unable to find requested field <kcmdline> from table <bootparams>" on ubuntu20.04. #7147

Open ibmxianliqi opened 2 years ago

ibmxianliqi commented 2 years ago

lsxcatd -a Version 2.16.4 (git commit 6c00ed5b573bb56a12c15efbeb085b506821496e, built Sat Apr 16 00:16:55 EDT 2022) This is a Management Node dbengine=SQLite

lsdef xcatn02 Object name: xcatn02 arch=x86_64 groups=ubuntu2004-diskful installnic=mac ip=30.30.30.2 mac=00:50:56:24:23:BD netboot=xnba os=ubuntu20.04.4 postbootscripts=otherpkgs postscripts=syslog,remoteshell,syncfiles primarynic=mac profile=compute provmethod=ubuntu20.04.4-x86_64-install-compute

nodeset xcatn02 osimage=ubuntu20.04.4-x86_64-install-compute xcatn02: [ubuntu2004-xcat-server]: Error: Unable to find requested field from table , with key Error: [ubuntu2004-xcat-server]: Failed to generate xnba configurations for some node(s) on ubuntu2004-xcat-server. Check xCAT log file for more details.

tabdump bootparams

node,kernel,initrd,kcmdline,addkcmdline,dhcpstatements,adddhcpstatements,comments,disable

These contents above is showing the issue's background and context. Although I get to know the xcat build 2.16.4 is a development build, I want to have a try whether it can do the basic node provision or not. The result is too bad, so I want to know when the offical 2.16.4 edition can support the ubuntu 20.04 version. or is there anyone to know how to fix the issue?

Thanks Qi Li

gurevichmark commented 2 years ago

@ibmxianliqi Currently there are no plans to officially support Ubuntu20, we are looking for community help and contributions in this area. I was able to provision a diskfull Ubuntu20.04 compute node, however, with some workarounds, described here: https://github.com/xcat2/xcat-core/pull/6975#issuecomment-948732489

ibmxianliqi commented 2 years ago

@gurevichmark Got it. thanks for your response quickly. I will have a try for these workarounds about #6975

ibmxianliqi commented 2 years ago

@gurevichmark I have already tried that workarounds about #6975, the nodeset issue has gone for me, but I encounter the issue that node can not dhcp for pxe boot. image

for this issue, can you give me any suggestions about how to fix it? Many thanks!

ibmxianliqi commented 2 years ago

@gurevichmark After I did nodeset for node again, I found the pxe boot is ok, but I encounter the following the hang issue: image

any idea or suggestions to me? Many thanks.

gurevichmark commented 2 years ago

@ibmxianliqi Have you tried removing nfs-common and chrony package entries from xCAT-server/share/xcat/install/ubuntu/compute.subiquity.tmpl ?

ibmxianliqi commented 2 years ago

@gurevichmark Good to know to remove these two packages from its template. After removed them, the diskful node can be provisioned but its ip address is not set OK. See the following picture: image can you give me a help? Many thanks.

gurevichmark commented 2 years ago

@ibmxianliqi Can you show the output of:

ibmxianliqi commented 2 years ago

@gurevichmark Here are the contents you required to help me:

lsdef xcatn03: Object name: xcatn03 arch=x86_64 currchain=boot currstate=boot groups=ubuntu2004-diskful installnic=mac ip=20.20.20.3 mac=00:50:56:23:9E:88 netboot=xnba nfsserver=20.20.20.20 os=ubuntu20.04.4 postbootscripts=otherpkgs postscripts=syslog,remoteshell,syncfiles primarynic=mac profile=compute provmethod=ubuntu20.04.4-x86_64-install-compute status=booting statustime=04-23-2022 09:24:06 tftpserver=20.20.20.20 xcatmaster=20.20.20.20

lsdef -t osimage ubuntu20.04.4-x86_64-install-compute:

Object name: ubuntu20.04.4-x86_64-install-compute imagetype=linux osarch=x86_64 osname=Linux osvers=ubuntu20.04.4 otherpkgdir=/install/post/otherpkgs/ubuntu20.04.4/x86_64 pkgdir=/install/ubuntu20.04.4/x86_64 pkglist=/opt/xcat/share/xcat/install/ubuntu/compute.ubuntu20.04.x86_64.pkglist profile=compute provmethod=install template=/opt/xcat/share/xcat/install/ubuntu/compute.subiquity.tmpl

makedhcp -q xcatn03: xcatn03: ip-address = 20.20.20.3, hardware-address = 00:50:56:23:9e:88

/var/log/xcat/xcat.log: [get_install_disk]Check the partition sda1. [get_install_disk] Partition sda1 mount success. [get_install_disk] The partition sda1 has kernel file. [get_install_disk]Check the partition sda2. [get_install_disk] The disk sda had OS installed, check next partition. [get_install_disk]Check the partition sda3. [get_install_disk] The disk sda had OS installed, check next partition. [get_install_disk]The disks which have kernel: [get_install_disk] sda

[get_install_disk]The disk sda information: [get_install_disk] disk_wwn= [get_install_disk] disk_path=/devices/pci0000:00/0000:00:10.0/host32/target32:0:0/32:0:0:0/block/sda [get_install_disk] disk_driver=mptspi [get_install_disk] Add disk: sda /devices/pci0000:00/0000:00:10.0/host32/target32:0:0/32:0:0:0/block/sda into path thirdchoicedisks [get_install_disk]The install_disk is /dev/sda by sorting path and DRIVER. Running early_command Installation script... Updating storage config... Disabling systemd-resolved... Sourcing file /etc/default/grub' Sourcing file/etc/default/grub.d/init-select.cfg' Generating grub configuration file ... Found linux image: /boot/vmlinuz-5.13.0-30-generic Found initrd image: /boot/initrd.img-5.13.0-30-generic done Running late_command Installation script... --2022-04-23 09:44:23-- http://20.20.20.20/install/autoinst/xcatn03.post Connecting to 20.20.20.20:80... connected. HTTP request sent, awaiting response... 200 OK Length: 75939 (74K) Saving to: 'xcatn03.post'

 0K .......... .......... .......... .......... .......... 67%  340M 0s
50K .......... .......... ....                            100%  296M=0s

2022-04-23 09:44:23 (325 MB/s) - 'xcatn03.post' saved [75939/75939]

Sat Apr 23 09:44:24 UTC 2022 [info]: xcat.deployment: Executing post.xcat to prepare for firstbooting ... Sat Apr 23 09:44:48 UTC 2022 [info]: xcat.deployment: trying to download postscripts from 20.20.20.20... Sat Apr 23 09:44:48 UTC 2022 [info]: xcat.deployment: postscripts downloaded successfully Sat Apr 23 09:44:48 UTC 2022 [info]: xcat.deployment: trying to get mypostscript from 20.20.20.20... systemd 245 (245.4-4ubuntu3.15) +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid Sat Apr 23 09:44:48 UTC 2022 [info]: xcat.deployment.postscript: postscript start..: syslog grep: /etc/rsyslog.d/remote.conf: No such file or directory grep: /etc/rsyslog.d/remote.conf: No such file or directory Sat Apr 23 09:44:48 UTC 2022 [info]: xcat.deployment.postscript: postscript end...: syslog return with 0 Sat Apr 23 09:44:48 UTC 2022 [info]: xcat.deployment.postscript: postscript start..: remoteshell

./remoteshell: line 567: kill: (9) - No such process ./remoteshell: line 567: kill: (2078) - No such process Sat Apr 23 09:44:50 UTC 2022 [info]: xcat.deployment.postscript: postscript end...: remoteshell return with 0 Sat Apr 23 09:44:50 UTC 2022 [info]: xcat.deployment.postscript: postscript start..: syncfiles Did not sync any files. Sat Apr 23 09:44:50 UTC 2022 [info]: xcat.deployment.postscript: postscript end...: syncfiles return with 0 Sourcing file /etc/default/grub' Sourcing file/etc/default/grub.d/init-select.cfg' Generating grub configuration file ... Found linux image: /boot/vmlinuz-5.13.0-30-generic Found initrd image: /boot/initrd.img-5.13.0-30-generic done Sourcing file /etc/default/grub' Sourcing file/etc/default/grub.d/init-select.cfg' Generating grub configuration file ... Found linux image: /boot/vmlinuz-5.13.0-30-generic Found initrd image: /boot/initrd.img-5.13.0-30-generic done x86_64 Sat Apr 23 09:44:52 UTC 2022 [info]: xcat.deployment: finished firstboot preparation, sending request to 20.20.20.20:3002 for changing status... updateflag.awk: Retrying flag update updateflag.awk: Retrying flag update updateflag.awk: Retrying flag update updateflag.awk: Retrying flag update updateflag.awk: Retrying flag update updateflag.awk: Retrying flag update updateflag.awk: Retrying flag update updateflag.awk: Retrying flag update updateflag.awk: Retrying flag update updateflag.awk: Retrying flag update updateflag.awk: Retrying flag update updateflag.awk: flag update failed Sat Apr 23 17:50:23 CST 2022 [error]: xcat.deployment: the network between the node and 20.20.20.20 is not ready, please check[retry=90]...

Based on the contents from xcat.log file, it should be a network issue, maybe at that phase, the node's ip is not set up for use, so throws these error. so I added the line "dhclient $INSTALLNIC> /dev/null" in the following code line:

697 MACADDR=grep MACADDRESS= /xcatpost/mypostscript.post | awk -F = '{ print $2 }'| sed "s/\'//g" 698 INSTALLNIC=ip -o link | grep -i "$MACADDR" | awk '{ print $2 }' | sed "s/://" 699 dhclient $INSTALLNIC > /dev/null 700 701 # the network between the node and MASTER might be not well configured and activated when running the PBS sometimes 702 # need to make sure... 703 RETRY=0 704 while true; do 705 #check whether the network access between MN/CN and the node is ready 706 ping $MASTER_IP -c 1 >/dev/null && break 707 dhclient $INSTALLNIC > /dev/null 708 709 RETRY=$[ $RETRY + 1 ] 710

Then, now it works fine as below: ssh xcatn03 Welcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.13.0-30-generic x86_64)

0 updates can be applied immediately.

Failed to connect to https://changelogs.ubuntu.com/meta-release-lts. Check your Internet connection or proxy settings

Your Hardware Enablement Stack (HWE) is supported until April 2025.

Last login: Sat Apr 23 19:54:58 2022 from 20.20.20.20

So my question is that do you know which component will generate the xcatn03.post script when I run nodeset for xcatn03? Please help me point it out.

Thanks Qi Li

ibmxianliqi commented 2 years ago

@gurevichmark Another an issue about the node xcatn03 reboot case, I found the node xcatn03 ip address is not also set up, after rebooting the node xcatn03. So do you know which part of the code is focused on the node ip setup for reboot phase?

Thanks Qi Li

ibmxianliqi commented 2 years ago

@gurevichmark Now I have got the answer for my first issue "which component will generate the xcatn03.post script when I run nodeset for xcatn03", it is from the file: /install/postscripts/xcatinstallpost, so I added the dhclient code line into xcatinstallpost as final fix. So please help me for my second issue that the node xcatn03 reboot case. I'm earger to get your help for this issue.

Thanks Qi Li

gurevichmark commented 2 years ago

@ibmxianliqi Do you mean, after the initial install, you can ping and ssh to the xcatn03 using its assigned ip 20.20.20.3. But if you reboot xcatn03 you can no longer ping 20.20.20.3 ? Do you know what IP xcatn03 gets after the reboot ?

ibmxianliqi commented 2 years ago

@gurevichmark Yes, After rebooting the xcatn03, its ip can not be automatically assigned to 20.20.20.3. So far, I can manually do it through dhclient command to get its ip: 20.20.20.3, please see the below: image

So I need to know how to handle the reboot case for IP assigned automatically? Please give me a help!

Thanks Qi Li

gurevichmark commented 2 years ago

@ibmxianliqi

ibmxianliqi commented 2 years ago

@gurevichmark

I found the makedhcp -q almost can not get the node's ip and mac address, and today this dhcp issue is frequently encountered by me! The output of "makedhcp -q xcatn03" before initial install as below, so far I can't provision the xcatn03 now due to the dhcp issue.

root@xcat-server:~# makedhcp -q xcatn01 root@xcat-server:~# makedhcp -q xcatn02 xcatn02: ip-address = 20.20.20.2, hardware-address = 00:50:56:24:23:bd root@xcat-server:~# makedhcp -q xcatn03 root@xcat-server:~# makedhcp -q xcatn02 root@xcat-server:~# makedhcp -q xcatn01 root@xcat-server:~#

root@xcat-server:/opt/xcat/share/xcat/install/ubuntu# /etc/init.d/isc-dhcp-server status ● isc-dhcp-server.service - ISC DHCP IPv4 server Loaded: loaded (/lib/systemd/system/isc-dhcp-server.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2022-04-27 09:17:02 CST; 6min ago Docs: man:dhcpd(8) Main PID: 9738 (dhcpd) Tasks: 4 (limit: 4345) Memory: 2.0G CGroup: /system.slice/isc-dhcp-server.service └─9738 dhcpd -user dhcpd -group dhcpd -f -4 -pf /run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf ens33 …

Apr 27 09:17:02 xcat-server dhcpd[9738]: Sending on Socket/fallback/fallback-net Apr 27 09:17:02 xcat-server dhcpd[9738]: Server starting service. Apr 27 09:21:37 xcat-server dhcpd[9738]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no free leases Apr 27 09:21:39 xcat-server dhcpd[9738]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no free leases Apr 27 09:21:43 xcat-server dhcpd[9738]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no free leases Apr 27 09:21:51 xcat-server dhcpd[9738]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no free leases Apr 27 09:22:57 xcat-server dhcpd[9738]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no free leases Apr 27 09:22:59 xcat-server dhcpd[9738]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no free leases Apr 27 09:23:05 xcat-server dhcpd[9738]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no free leases Apr 27 09:23:12 xcat-server dhcpd[9738]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no free leases root@xcat-server:/opt/xcat/share/xcat/install/ubuntu#

Any suggestions for this issue?

ibmxianliqi commented 2 years ago

@gurevichmark

After removing those three attribues of xcatn03: tftpserver, nfsserver and xcatmaster, then I run the nodeset command for xcatn03. The result is the dhcp issue has gone. Do you know what is the difference between the node without three attributes and with three attributes?

lsdef xcatn03: Object name: xcatn03 arch=x86_64 currchain=boot currstate=install ubuntu20.04.4-x86_64-compute groups=ubuntu2004-diskful installnic=mac ip=20.20.20.3 mac=00:50:56:23:9E:88 netboot=xnba os=ubuntu20.04.4 postbootscripts=otherpkgs postscripts=syslog,remoteshell,syncfiles primarynic=mac profile=compute provmethod=ubuntu20.04.4-x86_64-install-compute status=powering-off statustime=04-27-2022 09:51:39 root@xcat-server:~#

/etc/init.d/isc-dhcp-server status ● isc-dhcp-server.service - ISC DHCP IPv4 server Loaded: loaded (/lib/systemd/system/isc-dhcp-server.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2022-04-27 09:30:44 CST; 20min ago Docs: man:dhcpd(8) Main PID: 10835 (dhcpd) Tasks: 4 (limit: 4345) Memory: 130.0M CGroup: /system.slice/isc-dhcp-server.service └─10835 dhcpd -user dhcpd -group dhcpd -f -4 -pf /run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf ens33…

Apr 27 09:48:35 xcat-server dhcpd[10835]: DHCPREQUEST for 20.20.20.3 (20.20.20.20) from 00:50:56:23:9e:88 via ens34 Apr 27 09:48:35 xcat-server dhcpd[10835]: DHCPACK on 20.20.20.3 to 00:50:56:23:9e:88 via ens34 Apr 27 09:48:42 xcat-server dhcpd[10835]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34 Apr 27 09:48:42 xcat-server dhcpd[10835]: DHCPOFFER on 20.20.20.3 to 00:50:56:23:9e:88 via ens34 Apr 27 09:48:42 xcat-server dhcpd[10835]: DHCPREQUEST for 20.20.20.3 (20.20.20.20) from 00:50:56:23:9e:88 via ens34 Apr 27 09:48:42 xcat-server dhcpd[10835]: DHCPACK on 20.20.20.3 to 00:50:56:23:9e:88 via ens34 Apr 27 09:48:48 xcat-server dhcpd[10835]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34 Apr 27 09:48:48 xcat-server dhcpd[10835]: DHCPOFFER on 20.20.20.3 to 00:50:56:23:9e:88 via ens34 Apr 27 09:48:48 xcat-server dhcpd[10835]: DHCPREQUEST for 20.20.20.3 (20.20.20.20) from 00:50:56:23:9e:88 via ens34 Apr 27 09:48:48 xcat-server dhcpd[10835]: DHCPACK on 20.20.20.3 to 00:50:56:23:9e:88 via ens34 root@xcat-server:~#

gurevichmark commented 2 years ago

@ibmxianliqi

ibmxianliqi commented 2 years ago

@gurevichmark

After you removed the 3 attributes from the node definition, did you ran makedhcp -n or makedhcp -a commands ? [Qi Li] Yes, I ran the makedhcp -n command. But sometimes I still encounter the node's CAN NOT get dhcp assigend IP.

Does the command makedhcp -q xcatn03 work after removing the 3 attributes from the node definition ? [Qi Li] It always doesn't work for me and there is no any output from makedhcp -q xcatn03.

Since the node xcatn03 is assigned to ubuntu2004-diskful group, can you show the definition of that group - lsdef -t group ubuntu2004-diskful [Qi Li] lsdef -t group ubuntu2004-diskful Object name: ubuntu2004-diskful members=xcatn02,xcatn03 provmethod=ubuntu20.04.4-x86_64-install-compute root@ubuntu-xcat-server:~#

Can you check what DHCP servers running on your network ? - xcatprobe detect_dhcpd -i -m 00:50:56:23:9E:88 [Qi Li] xcatprobe detect_dhcpd -i ens34 -m 00:50:56:23:9E:88 Start to detect DHCP, please wait 10 seconds [INFO] ++++++++++++++++++++++++++++++++++ [INFO] There are 0 servers replied to dhcp discover. [INFO] ++++++++++++++++++++++++++++++++++ [INFO] root@ubuntu-xcat-server:~#

/etc/init.d/isc-dhcp-server status ● isc-dhcp-server.service - ISC DHCP IPv4 server Loaded: loaded (/lib/systemd/system/isc-dhcp-server.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2022-04-28 09:23:58 CST; 16min ago Docs: man:dhcpd(8) Main PID: 4407 (dhcpd) Tasks: 4 (limit: 4575) Memory: 13.1M CGroup: /system.slice/isc-dhcp-server.service └─4407 dhcpd -user dhcpd -group dhcpd -f -4 -pf /run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf ens33 …

Apr 28 09:38:52 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:38:54 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:38:58 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:39:06 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:39:09 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:39:11 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:39:13 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:39:15 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:39:17 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:39:19 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Hint: Some lines were ellipsized, use -l to show in full. root@ubuntu-xcat-server:~#

Do you know what IP xcatn03 gets assigned after the reboot ? [Qi Li] As my dhcpd runs on the 20.20.20.0/24 network, so xcatn03 node will be assigned 20.20.20.3 ip as I defined for its node definition as below: lsdef xcatn03 Object name: xcatn03 arch=x86_64 currchain=boot currstate=install ubuntu20.04.4-x86_64-compute groups=ubuntu2004-diskful installnic=mac ip=20.20.20.3 mac=00:50:56:23:9E:88 netboot=xnba nfsserver=20.20.20.20 os=ubuntu20.04.4 postbootscripts=otherpkgs postscripts=syslog,remoteshell,syncfiles primarynic=mac profile=compute provmethod=ubuntu20.04.4-x86_64-install-compute tftpserver=20.20.20.20 xcatmaster=20.20.20.20 root@ubuntu-xcat-server:~#

But yesterday I always encountered the issue that the node CAN NOT get the dhcp assigned IP on both node first boot and reboot phase, even if for the node PXE phase:

image

So can you explain the whole mechnism from dhcp discovery to node's getting dhcp's ip at the code level? I want to deeply dig the dhcp issue out to find the real root cause, which will be very helpful for everyone in xcat community.

Thanks Qi Li

ibmxianliqi commented 2 years ago

@gurevichmark

In my xcat mn environment, I found the same dhcp server runs on the ens34 nic, but for the same type diskful node: xcatn02, xcatn03 have the different outputs from xcatprobe detect_dhcpd -i ens34. xcatn03(20.20.20.3) -> 00:50:56:23:9E:88 xcatprobe detect_dhcpd -i ens34 -m 00:50:56:23:9E:88 Start to detect DHCP, please wait 10 seconds [INFO] ++++++++++++++++++++++++++++++++++ [INFO] There are 0 servers replied to dhcp discover. [INFO] ++++++++++++++++++++++++++++++++++ [INFO] root@ubuntu-xcat-server:~#

xcatn02(20.20.20.2) -> 00:50:56:24:23:BD xcatprobe detect_dhcpd -i ens34 -m 00:50:56:24:23:BD Start to detect DHCP, please wait 10 seconds [INFO] ++++++++++++++++++++++++++++++++++ [INFO] There are 1 servers replied to dhcp discover. [INFO] Server:20.20.20.20 assign IP [20.20.20.2]. The next server is [20.20.20.20]! [INFO] ++++++++++++++++++++++++++++++++++ [INFO] root@ubuntu-xcat-server:~#

Below are the xcatn03 and xcatn02 node definition:

lsdef xcatn03 Object name: xcatn03 arch=x86_64 currchain=boot currstate=install ubuntu20.04.4-x86_64-compute groups=ubuntu2004-diskful installnic=mac ip=20.20.20.3 mac=00:50:56:23:9E:88 netboot=xnba nfsserver=20.20.20.20 os=ubuntu20.04.4 postbootscripts=otherpkgs postscripts=syslog,remoteshell,syncfiles primarynic=mac profile=compute provmethod=ubuntu20.04.4-x86_64-install-compute tftpserver=20.20.20.20 xcatmaster=20.20.20.20 root@ubuntu-xcat-server:~#

lsdef xcatn02 Object name: xcatn02 arch=x86_64 currchain=boot currstate=boot groups=ubuntu2004-diskful installnic=mac ip=20.20.20.2 mac=00:50:56:24:23:BD netboot=xnba nfsserver=20.20.20.20 os=ubuntu20.04.4 postbootscripts=otherpkgs postscripts=syslog,remoteshell,syncfiles primarynic=mac profile=compute provmethod=ubuntu20.04.4-x86_64-install-compute status=booted statustime=04-28-2022 00:49:56 tftpserver=20.20.20.20 xcatmaster=20.20.20.20 root@ubuntu-xcat-server:~#

Any suggestions for this difference?

Thanks Qi Li

gurevichmark commented 2 years ago

Once the node definition contains ip and mac attributes, you should run makedhcp -n makedhcp <nodename>

That should add an entry to /var/lib/dhcp/dhcpd.leases file for the "nodename" The entry should looks similar to this:

host f6u13k11 {
  dynamic;
  hardware ethernet 42:8a:0a:06:0d:0b;
  uid 42:8a:0a:06:0d:0b;
  fixed-address 10.6.13.11;
        supersede server.ddns-hostname = "f6u13k11";
        supersede host-name = "f6u13k11";
        supersede server.filename = "/boot/grub2/grub2-f6u13k11";
}

Then, when the compute node boots, the DHCP server on the management node will get a request for an IP address for MAC. If your /var/lib/dhcp/dhcpd.leases file contains an entry matching that MAC, the DHCP server should reply with IP address for that entry. makedhcp -q <node> should return similar information.

I suspect if you look into /var/lib/dhcp/dhcpd.leases file, there will be an entry for xcatn02, but maybe not for xcatn03 ?

ibmxianliqi commented 2 years ago

@gurevichmark

Thanks for your great helpful the dhcp's mechanism explanation. But I found the "makedhcp -n; makedhcp nodename" is not reliable on my ubuntu20.04 management node, which will result in there is no node's dhcp entry in /var/lib/dhcp/dhcpd.leases file. So at the node's pxe boot, the node always can not get the assigned IP. And I find an issue about running makedhcp may be hang as below:

/etc/init.d/xcatd status

Apr 29 09:21:46 ubuntu-xcat-server in.tftpd[4916]: tftp: client does not accept options Apr 29 09:21:46 ubuntu-xcat-server in.tftpd[4917]: RRQ from 20.20.20.3 filename xcat/xnba.kpxe Apr 29 09:24:37 ubuntu-xcat-server in.tftpd[5792]: RRQ from 20.20.20.3 filename xcat/xnba.kpxe Apr 29 09:24:37 ubuntu-xcat-server in.tftpd[5792]: tftp: client does not accept options Apr 29 09:24:37 ubuntu-xcat-server in.tftpd[5793]: RRQ from 20.20.20.3 filename xcat/xnba.kpxe Apr 29 09:25:58 ubuntu-xcat-server in.tftpd[6160]: RRQ from 20.20.20.3 filename xcat/xnba.kpxe Apr 29 09:25:58 ubuntu-xcat-server in.tftpd[6160]: tftp: client does not accept options Apr 29 09:25:58 ubuntu-xcat-server in.tftpd[6161]: RRQ from 20.20.20.3 filename xcat/xnba.kpxe Apr 29 09:35:33 ubuntu-xcat-server omshell[6998]: Bad descriptor 6. Apr 29 09:35:33 ubuntu-xcat-server omshell[6998]: Object 557553551b40 io root@ubuntu-xcat-server:~#

If you encountered the issue above, please let me know how to fix it or any workaround for it.

Thanks Qi Li

gurevichmark commented 2 years ago

Maybe this problem is related to Ubuntu20 ? Since xCAT is not supported on Ubuntu20, I have not encountered this issue.

You can try turning debug on and running xCAT on the foreground to see if any additional, useful information is displayed:

You can later turn off debug with chdef -t site clustersite xcatdebugmode=0

ibmxianliqi commented 2 years ago

@gurevichmark Thanks for your detailed debug information. I will have a try later, as recently I'm busy with other more important things.

Thanks Qi Li

zhhmzz commented 2 months ago

您好,我正在尝试使用xcat部署Ubuntu20.04.6LTS。我使用的系统是rhel7.9。每次都会卡在语言选择界面。我不知道是什么原因。而且我不知道该如何配置镜像的配置文件(/opt/xcat/share/xcat/install/ubuntu/compute.subiquity.tmpl)。请给我提供帮助,谢谢。

zhhmzz commented 2 months ago

还有一个问题,我的另一个xcat(rhel7.9)在部署Ubuntu20.04.6系统的时候会报错,我不知道该如何解决它。下面是报错内容。 [root@node21 ~]# nodeset node63 osimage=ubuntu20.04.6-x86_64-install-compute node63: [node21]: Error: Unable to find requested field from table , with key Error: [node21]: Failed to generate xnba configurations for some node(s) on node21. Check xCAT log file for more details.