Open ibmxianliqi opened 2 years ago
@ibmxianliqi Currently there are no plans to officially support Ubuntu20, we are looking for community help and contributions in this area. I was able to provision a diskfull Ubuntu20.04 compute node, however, with some workarounds, described here: https://github.com/xcat2/xcat-core/pull/6975#issuecomment-948732489
@gurevichmark Got it. thanks for your response quickly. I will have a try for these workarounds about #6975
@gurevichmark I have already tried that workarounds about #6975, the nodeset issue has gone for me, but I encounter the issue that node can not dhcp for pxe boot.
for this issue, can you give me any suggestions about how to fix it? Many thanks!
@gurevichmark After I did nodeset for node again, I found the pxe boot is ok, but I encounter the following the hang issue:
any idea or suggestions to me? Many thanks.
@ibmxianliqi Have you tried removing nfs-common
and chrony
package entries from xCAT-server/share/xcat/install/ubuntu/compute.subiquity.tmpl
?
@gurevichmark Good to know to remove these two packages from its template. After removed them, the diskful node can be provisioned but its ip address is not set OK. See the following picture: can you give me a help? Many thanks.
@ibmxianliqi Can you show the output of:
lsdef xcatn03
lsdef -t osimage ubuntu20.04.4-x86_64-install-compute
makedhcp -q xcatn03
xcatn03
node what is in /var/log/xcat/xcat.log
?@gurevichmark Here are the contents you required to help me:
lsdef xcatn03: Object name: xcatn03 arch=x86_64 currchain=boot currstate=boot groups=ubuntu2004-diskful installnic=mac ip=20.20.20.3 mac=00:50:56:23:9E:88 netboot=xnba nfsserver=20.20.20.20 os=ubuntu20.04.4 postbootscripts=otherpkgs postscripts=syslog,remoteshell,syncfiles primarynic=mac profile=compute provmethod=ubuntu20.04.4-x86_64-install-compute status=booting statustime=04-23-2022 09:24:06 tftpserver=20.20.20.20 xcatmaster=20.20.20.20
lsdef -t osimage ubuntu20.04.4-x86_64-install-compute:
Object name: ubuntu20.04.4-x86_64-install-compute imagetype=linux osarch=x86_64 osname=Linux osvers=ubuntu20.04.4 otherpkgdir=/install/post/otherpkgs/ubuntu20.04.4/x86_64 pkgdir=/install/ubuntu20.04.4/x86_64 pkglist=/opt/xcat/share/xcat/install/ubuntu/compute.ubuntu20.04.x86_64.pkglist profile=compute provmethod=install template=/opt/xcat/share/xcat/install/ubuntu/compute.subiquity.tmpl
makedhcp -q xcatn03: xcatn03: ip-address = 20.20.20.3, hardware-address = 00:50:56:23:9e:88
/var/log/xcat/xcat.log: [get_install_disk]Check the partition sda1. [get_install_disk] Partition sda1 mount success. [get_install_disk] The partition sda1 has kernel file. [get_install_disk]Check the partition sda2. [get_install_disk] The disk sda had OS installed, check next partition. [get_install_disk]Check the partition sda3. [get_install_disk] The disk sda had OS installed, check next partition. [get_install_disk]The disks which have kernel: [get_install_disk] sda
[get_install_disk]The disk sda information:
[get_install_disk] disk_wwn=
[get_install_disk] disk_path=/devices/pci0000:00/0000:00:10.0/host32/target32:0:0/32:0:0:0/block/sda
[get_install_disk] disk_driver=mptspi
[get_install_disk] Add disk: sda /devices/pci0000:00/0000:00:10.0/host32/target32:0:0/32:0:0:0/block/sda into path thirdchoicedisks
[get_install_disk]The install_disk is /dev/sda by sorting path and DRIVER.
Running early_command Installation script...
Updating storage config...
Disabling systemd-resolved...
Sourcing file /etc/default/grub' Sourcing file
/etc/default/grub.d/init-select.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.13.0-30-generic
Found initrd image: /boot/initrd.img-5.13.0-30-generic
done
Running late_command Installation script...
--2022-04-23 09:44:23-- http://20.20.20.20/install/autoinst/xcatn03.post
Connecting to 20.20.20.20:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75939 (74K)
Saving to: 'xcatn03.post'
0K .......... .......... .......... .......... .......... 67% 340M 0s
50K .......... .......... .... 100% 296M=0s
2022-04-23 09:44:23 (325 MB/s) - 'xcatn03.post' saved [75939/75939]
Sat Apr 23 09:44:24 UTC 2022 [info]: xcat.deployment: Executing post.xcat to prepare for firstbooting ... Sat Apr 23 09:44:48 UTC 2022 [info]: xcat.deployment: trying to download postscripts from 20.20.20.20... Sat Apr 23 09:44:48 UTC 2022 [info]: xcat.deployment: postscripts downloaded successfully Sat Apr 23 09:44:48 UTC 2022 [info]: xcat.deployment: trying to get mypostscript from 20.20.20.20... systemd 245 (245.4-4ubuntu3.15) +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid Sat Apr 23 09:44:48 UTC 2022 [info]: xcat.deployment.postscript: postscript start..: syslog grep: /etc/rsyslog.d/remote.conf: No such file or directory grep: /etc/rsyslog.d/remote.conf: No such file or directory Sat Apr 23 09:44:48 UTC 2022 [info]: xcat.deployment.postscript: postscript end...: syslog return with 0 Sat Apr 23 09:44:48 UTC 2022 [info]: xcat.deployment.postscript: postscript start..: remoteshell
./remoteshell: line 567: kill: (9) - No such process
./remoteshell: line 567: kill: (2078) - No such process
Sat Apr 23 09:44:50 UTC 2022 [info]: xcat.deployment.postscript: postscript end...: remoteshell return with 0
Sat Apr 23 09:44:50 UTC 2022 [info]: xcat.deployment.postscript: postscript start..: syncfiles
Did not sync any files.
Sat Apr 23 09:44:50 UTC 2022 [info]: xcat.deployment.postscript: postscript end...: syncfiles return with 0
Sourcing file /etc/default/grub' Sourcing file
/etc/default/grub.d/init-select.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.13.0-30-generic
Found initrd image: /boot/initrd.img-5.13.0-30-generic
done
Sourcing file /etc/default/grub' Sourcing file
/etc/default/grub.d/init-select.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.13.0-30-generic
Found initrd image: /boot/initrd.img-5.13.0-30-generic
done
x86_64
Sat Apr 23 09:44:52 UTC 2022 [info]: xcat.deployment: finished firstboot preparation, sending request to 20.20.20.20:3002 for changing status...
updateflag.awk: Retrying flag update
updateflag.awk: Retrying flag update
updateflag.awk: Retrying flag update
updateflag.awk: Retrying flag update
updateflag.awk: Retrying flag update
updateflag.awk: Retrying flag update
updateflag.awk: Retrying flag update
updateflag.awk: Retrying flag update
updateflag.awk: Retrying flag update
updateflag.awk: Retrying flag update
updateflag.awk: Retrying flag update
updateflag.awk: flag update failed
Sat Apr 23 17:50:23 CST 2022 [error]: xcat.deployment: the network between the node and 20.20.20.20 is not ready, please check[retry=90]...
Based on the contents from xcat.log file, it should be a network issue, maybe at that phase, the node's ip is not set up for use, so throws these error. so I added the line "dhclient $INSTALLNIC> /dev/null" in the following code line:
697 MACADDR=grep MACADDRESS= /xcatpost/mypostscript.post | awk -F = '{ print $2 }'| sed "s/\'//g"
698 INSTALLNIC=ip -o link | grep -i "$MACADDR" | awk '{ print $2 }' | sed "s/://"
699 dhclient $INSTALLNIC > /dev/null
700
701 # the network between the node and MASTER might be not well configured and activated when running the PBS sometimes
702 # need to make sure...
703 RETRY=0
704 while true; do
705 #check whether the network access between MN/CN and the node is ready
706 ping $MASTER_IP -c 1 >/dev/null && break
707 dhclient $INSTALLNIC > /dev/null
708
709 RETRY=$[ $RETRY + 1 ]
710
Then, now it works fine as below: ssh xcatn03 Welcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.13.0-30-generic x86_64)
Support: https://ubuntu.com/advantage
System information as of Sat 23 Apr 2022 08:30:28 PM CST
System load: 0.0 Processes: 214 Usage of /: 16.9% of 36.66GB Users logged in: 1 Memory usage: 6% IPv4 address for ens33: 20.20.20.3 Swap usage: 0%
0 updates can be applied immediately.
Failed to connect to https://changelogs.ubuntu.com/meta-release-lts. Check your Internet connection or proxy settings
Your Hardware Enablement Stack (HWE) is supported until April 2025.
Last login: Sat Apr 23 19:54:58 2022 from 20.20.20.20
So my question is that do you know which component will generate the xcatn03.post script when I run nodeset for xcatn03? Please help me point it out.
Thanks Qi Li
@gurevichmark Another an issue about the node xcatn03 reboot case, I found the node xcatn03 ip address is not also set up, after rebooting the node xcatn03. So do you know which part of the code is focused on the node ip setup for reboot phase?
Thanks Qi Li
@gurevichmark Now I have got the answer for my first issue "which component will generate the xcatn03.post script when I run nodeset for xcatn03", it is from the file: /install/postscripts/xcatinstallpost, so I added the dhclient code line into xcatinstallpost as final fix. So please help me for my second issue that the node xcatn03 reboot case. I'm earger to get your help for this issue.
Thanks Qi Li
@ibmxianliqi
Do you mean, after the initial install, you can ping and ssh to the xcatn03
using its assigned ip 20.20.20.3
. But if you reboot xcatn03
you can no longer ping 20.20.20.3
? Do you know what IP xcatn03
gets after the reboot ?
@gurevichmark Yes, After rebooting the xcatn03, its ip can not be automatically assigned to 20.20.20.3. So far, I can manually do it through dhclient command to get its ip: 20.20.20.3, please see the below:
So I need to know how to handle the reboot case for IP assigned automatically? Please give me a help!
Thanks Qi Li
@ibmxianliqi
makedhcp -q xcatn03
before the initial install, after the initial install and then after the reboot ?xcatn03
get any IP assigned after the reboot ? Or are you getting to it through the rcons
?@gurevichmark
I found the makedhcp -q almost can not get the node's ip and mac address, and today this dhcp issue is frequently encountered by me! The output of "makedhcp -q xcatn03" before initial install as below, so far I can't provision the xcatn03 now due to the dhcp issue.
root@xcat-server:~# makedhcp -q xcatn01 root@xcat-server:~# makedhcp -q xcatn02 xcatn02: ip-address = 20.20.20.2, hardware-address = 00:50:56:24:23:bd root@xcat-server:~# makedhcp -q xcatn03 root@xcat-server:~# makedhcp -q xcatn02 root@xcat-server:~# makedhcp -q xcatn01 root@xcat-server:~#
root@xcat-server:/opt/xcat/share/xcat/install/ubuntu# /etc/init.d/isc-dhcp-server status ● isc-dhcp-server.service - ISC DHCP IPv4 server Loaded: loaded (/lib/systemd/system/isc-dhcp-server.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2022-04-27 09:17:02 CST; 6min ago Docs: man:dhcpd(8) Main PID: 9738 (dhcpd) Tasks: 4 (limit: 4345) Memory: 2.0G CGroup: /system.slice/isc-dhcp-server.service └─9738 dhcpd -user dhcpd -group dhcpd -f -4 -pf /run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf ens33 …
Apr 27 09:17:02 xcat-server dhcpd[9738]: Sending on Socket/fallback/fallback-net Apr 27 09:17:02 xcat-server dhcpd[9738]: Server starting service. Apr 27 09:21:37 xcat-server dhcpd[9738]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no free leases Apr 27 09:21:39 xcat-server dhcpd[9738]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no free leases Apr 27 09:21:43 xcat-server dhcpd[9738]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no free leases Apr 27 09:21:51 xcat-server dhcpd[9738]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no free leases Apr 27 09:22:57 xcat-server dhcpd[9738]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no free leases Apr 27 09:22:59 xcat-server dhcpd[9738]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no free leases Apr 27 09:23:05 xcat-server dhcpd[9738]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no free leases Apr 27 09:23:12 xcat-server dhcpd[9738]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no free leases root@xcat-server:/opt/xcat/share/xcat/install/ubuntu#
Any suggestions for this issue?
@gurevichmark
After removing those three attribues of xcatn03: tftpserver, nfsserver and xcatmaster, then I run the nodeset command for xcatn03. The result is the dhcp issue has gone. Do you know what is the difference between the node without three attributes and with three attributes?
lsdef xcatn03: Object name: xcatn03 arch=x86_64 currchain=boot currstate=install ubuntu20.04.4-x86_64-compute groups=ubuntu2004-diskful installnic=mac ip=20.20.20.3 mac=00:50:56:23:9E:88 netboot=xnba os=ubuntu20.04.4 postbootscripts=otherpkgs postscripts=syslog,remoteshell,syncfiles primarynic=mac profile=compute provmethod=ubuntu20.04.4-x86_64-install-compute status=powering-off statustime=04-27-2022 09:51:39 root@xcat-server:~#
/etc/init.d/isc-dhcp-server status ● isc-dhcp-server.service - ISC DHCP IPv4 server Loaded: loaded (/lib/systemd/system/isc-dhcp-server.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2022-04-27 09:30:44 CST; 20min ago Docs: man:dhcpd(8) Main PID: 10835 (dhcpd) Tasks: 4 (limit: 4345) Memory: 130.0M CGroup: /system.slice/isc-dhcp-server.service └─10835 dhcpd -user dhcpd -group dhcpd -f -4 -pf /run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf ens33…
Apr 27 09:48:35 xcat-server dhcpd[10835]: DHCPREQUEST for 20.20.20.3 (20.20.20.20) from 00:50:56:23:9e:88 via ens34 Apr 27 09:48:35 xcat-server dhcpd[10835]: DHCPACK on 20.20.20.3 to 00:50:56:23:9e:88 via ens34 Apr 27 09:48:42 xcat-server dhcpd[10835]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34 Apr 27 09:48:42 xcat-server dhcpd[10835]: DHCPOFFER on 20.20.20.3 to 00:50:56:23:9e:88 via ens34 Apr 27 09:48:42 xcat-server dhcpd[10835]: DHCPREQUEST for 20.20.20.3 (20.20.20.20) from 00:50:56:23:9e:88 via ens34 Apr 27 09:48:42 xcat-server dhcpd[10835]: DHCPACK on 20.20.20.3 to 00:50:56:23:9e:88 via ens34 Apr 27 09:48:48 xcat-server dhcpd[10835]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34 Apr 27 09:48:48 xcat-server dhcpd[10835]: DHCPOFFER on 20.20.20.3 to 00:50:56:23:9e:88 via ens34 Apr 27 09:48:48 xcat-server dhcpd[10835]: DHCPREQUEST for 20.20.20.3 (20.20.20.20) from 00:50:56:23:9e:88 via ens34 Apr 27 09:48:48 xcat-server dhcpd[10835]: DHCPACK on 20.20.20.3 to 00:50:56:23:9e:88 via ens34 root@xcat-server:~#
@ibmxianliqi
makedhcp -n
or makedhcp -a
commands ?makedhcp -q xcatn03
work after removing the 3 attributes from the node definition ?xcatn03
is assigned to ubuntu2004-diskful
group, can you show the definition of that group - lsdef -t group ubuntu2004-diskful
xcatprobe detect_dhcpd -i <interface facing compute node> -m 00:50:56:23:9E:88
xcatn03
gets assigned after the reboot ?@gurevichmark
After you removed the 3 attributes from the node definition, did you ran makedhcp -n or makedhcp -a commands ? [Qi Li] Yes, I ran the makedhcp -n command. But sometimes I still encounter the node's CAN NOT get dhcp assigend IP.
Does the command makedhcp -q xcatn03 work after removing the 3 attributes from the node definition ? [Qi Li] It always doesn't work for me and there is no any output from makedhcp -q xcatn03.
Since the node xcatn03 is assigned to ubuntu2004-diskful group, can you show the definition of that group - lsdef -t group ubuntu2004-diskful [Qi Li] lsdef -t group ubuntu2004-diskful Object name: ubuntu2004-diskful members=xcatn02,xcatn03 provmethod=ubuntu20.04.4-x86_64-install-compute root@ubuntu-xcat-server:~#
Can you check what DHCP servers running on your network ? - xcatprobe detect_dhcpd -i
/etc/init.d/isc-dhcp-server status ● isc-dhcp-server.service - ISC DHCP IPv4 server Loaded: loaded (/lib/systemd/system/isc-dhcp-server.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2022-04-28 09:23:58 CST; 16min ago Docs: man:dhcpd(8) Main PID: 4407 (dhcpd) Tasks: 4 (limit: 4575) Memory: 13.1M CGroup: /system.slice/isc-dhcp-server.service └─4407 dhcpd -user dhcpd -group dhcpd -f -4 -pf /run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf ens33 …
Apr 28 09:38:52 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:38:54 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:38:58 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:39:06 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:39:09 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:39:11 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:39:13 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:39:15 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:39:17 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Apr 28 09:39:19 ubuntu-xcat-server dhcpd[4407]: DHCPDISCOVER from 00:50:56:23:9e:88 via ens34: network ens34: no… leases Hint: Some lines were ellipsized, use -l to show in full. root@ubuntu-xcat-server:~#
Do you know what IP xcatn03 gets assigned after the reboot ? [Qi Li] As my dhcpd runs on the 20.20.20.0/24 network, so xcatn03 node will be assigned 20.20.20.3 ip as I defined for its node definition as below: lsdef xcatn03 Object name: xcatn03 arch=x86_64 currchain=boot currstate=install ubuntu20.04.4-x86_64-compute groups=ubuntu2004-diskful installnic=mac ip=20.20.20.3 mac=00:50:56:23:9E:88 netboot=xnba nfsserver=20.20.20.20 os=ubuntu20.04.4 postbootscripts=otherpkgs postscripts=syslog,remoteshell,syncfiles primarynic=mac profile=compute provmethod=ubuntu20.04.4-x86_64-install-compute tftpserver=20.20.20.20 xcatmaster=20.20.20.20 root@ubuntu-xcat-server:~#
But yesterday I always encountered the issue that the node CAN NOT get the dhcp assigned IP on both node first boot and reboot phase, even if for the node PXE phase:
So can you explain the whole mechnism from dhcp discovery to node's getting dhcp's ip at the code level? I want to deeply dig the dhcp issue out to find the real root cause, which will be very helpful for everyone in xcat community.
Thanks Qi Li
@gurevichmark
In my xcat mn environment, I found the same dhcp server runs on the ens34 nic, but for the same type diskful node: xcatn02, xcatn03 have the different outputs from xcatprobe detect_dhcpd -i ens34. xcatn03(20.20.20.3) -> 00:50:56:23:9E:88 xcatprobe detect_dhcpd -i ens34 -m 00:50:56:23:9E:88 Start to detect DHCP, please wait 10 seconds [INFO] ++++++++++++++++++++++++++++++++++ [INFO] There are 0 servers replied to dhcp discover. [INFO] ++++++++++++++++++++++++++++++++++ [INFO] root@ubuntu-xcat-server:~#
xcatn02(20.20.20.2) -> 00:50:56:24:23:BD xcatprobe detect_dhcpd -i ens34 -m 00:50:56:24:23:BD Start to detect DHCP, please wait 10 seconds [INFO] ++++++++++++++++++++++++++++++++++ [INFO] There are 1 servers replied to dhcp discover. [INFO] Server:20.20.20.20 assign IP [20.20.20.2]. The next server is [20.20.20.20]! [INFO] ++++++++++++++++++++++++++++++++++ [INFO] root@ubuntu-xcat-server:~#
Below are the xcatn03 and xcatn02 node definition:
lsdef xcatn03 Object name: xcatn03 arch=x86_64 currchain=boot currstate=install ubuntu20.04.4-x86_64-compute groups=ubuntu2004-diskful installnic=mac ip=20.20.20.3 mac=00:50:56:23:9E:88 netboot=xnba nfsserver=20.20.20.20 os=ubuntu20.04.4 postbootscripts=otherpkgs postscripts=syslog,remoteshell,syncfiles primarynic=mac profile=compute provmethod=ubuntu20.04.4-x86_64-install-compute tftpserver=20.20.20.20 xcatmaster=20.20.20.20 root@ubuntu-xcat-server:~#
lsdef xcatn02 Object name: xcatn02 arch=x86_64 currchain=boot currstate=boot groups=ubuntu2004-diskful installnic=mac ip=20.20.20.2 mac=00:50:56:24:23:BD netboot=xnba nfsserver=20.20.20.20 os=ubuntu20.04.4 postbootscripts=otherpkgs postscripts=syslog,remoteshell,syncfiles primarynic=mac profile=compute provmethod=ubuntu20.04.4-x86_64-install-compute status=booted statustime=04-28-2022 00:49:56 tftpserver=20.20.20.20 xcatmaster=20.20.20.20 root@ubuntu-xcat-server:~#
Any suggestions for this difference?
Thanks Qi Li
Once the node definition contains ip
and mac
attributes, you should run
makedhcp -n
makedhcp <nodename>
That should add an entry to /var/lib/dhcp/dhcpd.leases
file for the "nodename"
The entry should looks similar to this:
host f6u13k11 {
dynamic;
hardware ethernet 42:8a:0a:06:0d:0b;
uid 42:8a:0a:06:0d:0b;
fixed-address 10.6.13.11;
supersede server.ddns-hostname = "f6u13k11";
supersede host-name = "f6u13k11";
supersede server.filename = "/boot/grub2/grub2-f6u13k11";
}
Then, when the compute node boots, the DHCP server on the management node will get a request for an IP address for MAC. If your /var/lib/dhcp/dhcpd.leases
file contains an entry matching that MAC, the DHCP server should reply with IP address for that entry. makedhcp -q <node>
should return similar information.
I suspect if you look into /var/lib/dhcp/dhcpd.leases
file, there will be an entry for xcatn02
, but maybe not for xcatn03
?
@gurevichmark
Thanks for your great helpful the dhcp's mechanism explanation. But I found the "makedhcp -n; makedhcp nodename" is not reliable on my ubuntu20.04 management node, which will result in there is no node's dhcp entry in /var/lib/dhcp/dhcpd.leases file. So at the node's pxe boot, the node always can not get the assigned IP. And I find an issue about running makedhcp
/etc/init.d/xcatd status
Apr 29 09:21:46 ubuntu-xcat-server in.tftpd[4916]: tftp: client does not accept options Apr 29 09:21:46 ubuntu-xcat-server in.tftpd[4917]: RRQ from 20.20.20.3 filename xcat/xnba.kpxe Apr 29 09:24:37 ubuntu-xcat-server in.tftpd[5792]: RRQ from 20.20.20.3 filename xcat/xnba.kpxe Apr 29 09:24:37 ubuntu-xcat-server in.tftpd[5792]: tftp: client does not accept options Apr 29 09:24:37 ubuntu-xcat-server in.tftpd[5793]: RRQ from 20.20.20.3 filename xcat/xnba.kpxe Apr 29 09:25:58 ubuntu-xcat-server in.tftpd[6160]: RRQ from 20.20.20.3 filename xcat/xnba.kpxe Apr 29 09:25:58 ubuntu-xcat-server in.tftpd[6160]: tftp: client does not accept options Apr 29 09:25:58 ubuntu-xcat-server in.tftpd[6161]: RRQ from 20.20.20.3 filename xcat/xnba.kpxe Apr 29 09:35:33 ubuntu-xcat-server omshell[6998]: Bad descriptor 6. Apr 29 09:35:33 ubuntu-xcat-server omshell[6998]: Object 557553551b40 io root@ubuntu-xcat-server:~#
If you encountered the issue above, please let me know how to fix it or any workaround for it.
Thanks Qi Li
Maybe this problem is related to Ubuntu20 ? Since xCAT is not supported on Ubuntu20, I have not encountered this issue.
You can try turning debug on and running xCAT on the foreground to see if any additional, useful information is displayed:
xcatd
- systemctl stop xcatd
xcatd
on foreground - xcatd -f
chdef -t site clustersite xcatdebugmode=2
makedhcp xcatn03
You can later turn off debug with chdef -t site clustersite xcatdebugmode=0
@gurevichmark Thanks for your detailed debug information. I will have a try later, as recently I'm busy with other more important things.
Thanks Qi Li
您好,我正在尝试使用xcat部署Ubuntu20.04.6LTS。我使用的系统是rhel7.9。每次都会卡在语言选择界面。我不知道是什么原因。而且我不知道该如何配置镜像的配置文件(/opt/xcat/share/xcat/install/ubuntu/compute.subiquity.tmpl)。请给我提供帮助,谢谢。
还有一个问题,我的另一个xcat(rhel7.9)在部署Ubuntu20.04.6系统的时候会报错,我不知道该如何解决它。下面是报错内容。
[root@node21 ~]# nodeset node63 osimage=ubuntu20.04.6-x86_64-install-compute
node63: [node21]: Error: Unable to find requested field
lsxcatd -a Version 2.16.4 (git commit 6c00ed5b573bb56a12c15efbeb085b506821496e, built Sat Apr 16 00:16:55 EDT 2022) This is a Management Node dbengine=SQLite
lsdef xcatn02 Object name: xcatn02 arch=x86_64 groups=ubuntu2004-diskful installnic=mac ip=30.30.30.2 mac=00:50:56:24:23:BD netboot=xnba os=ubuntu20.04.4 postbootscripts=otherpkgs postscripts=syslog,remoteshell,syncfiles primarynic=mac profile=compute provmethod=ubuntu20.04.4-x86_64-install-compute
nodeset xcatn02 osimage=ubuntu20.04.4-x86_64-install-compute xcatn02: [ubuntu2004-xcat-server]: Error: Unable to find requested field from table , with key
Error: [ubuntu2004-xcat-server]: Failed to generate xnba configurations for some node(s) on ubuntu2004-xcat-server. Check xCAT log file for more details.
tabdump bootparams
node,kernel,initrd,kcmdline,addkcmdline,dhcpstatements,adddhcpstatements,comments,disable
These contents above is showing the issue's background and context. Although I get to know the xcat build 2.16.4 is a development build, I want to have a try whether it can do the basic node provision or not. The result is too bad, so I want to know when the offical 2.16.4 edition can support the ubuntu 20.04 version. or is there anyone to know how to fix the issue?
Thanks Qi Li