Closed kcgthb closed 4 years ago
for the CN node definition, did you set tftpserver to SN ?
chdef -t node -o $$CN servicenode=$$SN monserver=$$SN nfsserver=$$SN tftpserver=$$SN xcatmaster=$$SN
also for the service group, what's the attribute for those setup*
?
Hi @cxhong
The service
group has all the setup* attributes set:
# lsdef -t group service | grep setup
setupconserver=2
setupdhcp=1
setupipforward=1
setupnameserver=2
setupnfs=1
setupntp=1
setuptftp=1
The CN definition only has servicenode
and xcatmaster
defined, as the other ones are supposed to inherit those values if they're not defined:
# lsdef -c sh03-12n18 -i servicenode,monserver,nfsserver,tftpserver,xcatmaster
sh03-12n18: monserver=
sh03-12n18: nfsserver=
sh03-12n18: servicenode=sh03-sn02
sh03-12n18: tftpserver=
sh03-12n18: xcatmaster=sh03-sn02
The noderes(5)
manpage says:
xcatmaster
The hostname of the xCAT service node (as known by this node). This acts as the
default value for nfsserver and tftpserver, if they are not set.
So it's my understanding that tftpserver
doesn't need to be specified for the CN is xcatmaster
is set.
I tried to manually set tftpserver
, but it doesn't change the behavior:
# chdef sh03-12n18 tftpserver=sh03-sn02
1 object definitions have been created or modified.
# lsdef sh03-12n18 -i tftpserver
Object name: sh03-12n18
tftpserver=sh03-sn02
# nodeset sh03-12n18 shell -V
sh03-12n18: [sh02-hn01]: shell
sh03-12n18: [sh03-sn02]: shell
# xdsh sh02-hn01,sh03-sn02 cat /tftpboot/xcat/xnba/nodes/sh03-12n18 | xdshbak -c
HOSTS -------------------------------------------------------------------------
sh02-hn01
-------------------------------------------------------------------------------
#!gpxe
#shell
imgfetch -n kernel http://${next-server}:80/tftpboot/xcat/genesis.kernel.x86_64
imgload kernel
imgargs kernel quiet console=tty0 console=ttyS0,115200 xcatd=sh03-sn02:3001 destiny=shell BOOTIF=01-${netX/machyp}
imgfetch http://${next-server}:80/tftpboot/xcat/genesis.fs.x86_64.gz
imgexec kernel
HOSTS -------------------------------------------------------------------------
sh03-sn02
-------------------------------------------------------------------------------
#!gpxe
#shell
imgfetch -n kernel http://${next-server}:80/tftpboot/xcat/nbk.x86_64
imgload kernel
imgargs kernel quiet console=tty0 console=ttyS0,115200 xcatd=sh03-sn02:3001 BOOTIF=01-${netX/machyp}
imgfetch http://${next-server}:80/tftpboot/xcat/nkfs.x86_64.gz
imgexec kernel
Thanks!
the sharetftp
attribute must be 0 in your case, right? so, the files under /tftpboot will not be share between MN and SN. did u do this:
To make /install and /tftpboot directories local on each Service Node, set site table attributes and “sync” /install and /tftpoot directory contents from Management Node to Service Nodes:
chdef -t site clustersite sharedtftp=0
chdef -t site clustersite installloc=
rsync -auv --exclude 'autoinst' /install r1n01:/
rsync -auv --exclude 'autoinst' /install r2n01:/
rsync -auv --exclude 'autoinst' /tftpboot r1n01:/
rsync -auv --exclude 'autoinst' /tftpboot r2n01:/
On the SN, looks like there are no /tftpboot/xcat/genesis.kernel*
, maybe other files also missed. Can u check it?
the sharetftp attribute must be 0 in your case, right?
Correct, yes:
# tabdump site | grep sharedtftp
"sharedtftp","0",,
so, the files under /tftpboot will not be share between MN and SN. did u do this:
Yes, when the service nodes are deployed, the /install
and /tftpboot
directories are synced from the MN to the SNs.
On the SN, looks like there are no /tftpboot/xcat/genesis.kernel*, maybe other files also missed. Can u check it?
Aaah that was it! For some reason the genesis.kernel
was missing from that service node. Copying it back fixed the problem, thank you!
Now, shouldn't be an error message be generated if one runs nodeset node shell
and the genesis
kernel doesn't exist on the SN? It looks like right now, the genesis
kernel is silently replaced by nbk
, even if that one doesn't exist either, and the CN ultimately fails to boot.
An error message would be much better for diagnosing that kind of problem, I think.
The documentation you referenced says:
rsync -auv --exclude 'autoinst' /tftpboot r1n01:/
Yet, there is no autoinst
directory in /tftpboot
on the MN, is there? Shouldn't the /tftpboot/xcat/nodes
definitions be excluded instead in disjointdhcps
environments?
Thanks!
If there is no /tftpboot/xcat/genesis.kernel
files, the legacy
files will be created ( I don't know why)
if (-r "$tftpdir/xcat/genesis.kernel.$arch") {
} else { #'legacy' environment
$bphash->{$_}->[0]->{kernel} = "xcat/nbk.$arch";
$bphash->{$_}->[0]->{initrd} = "xcat/nkfs.$arch.gz";
$bphash->{$_}->[0]->{kcmdline} = $kcmdline . "xcatd=$master:$xcatdport";
}
right, currently, I didn't see /tftpboot/autoinst
, maybe we should remove --exclude
from this rsync
command.
rsync -auv --exclude 'autoinst' /tftpboot r1n01:/
If the cluster is hierarchical, we may only need to create tftpboot/xcat/node
on the xcatmaster
. looks like xCAT create on both MN and SN
@kcgthb Are you still seeing this problem ?
@gurevichmark Nope: after manually copying the genesis.kernel
to /tftpboot
on the SNs, the issue doesn't happen anymore.
I think there still should be:
genesis.kernel
has to be copied form the MN to the SNs,nodeset shell
and the genesis.kernel
doesn't exist on the relevant SN, rather than silently falling back to the nbk
kernel,nbk
kernel when it doesn't exist on the MN either.Thanks!
Hi!
We're running xCAT 2.15.1, with
site.disjointdhcps=1
in hierarchical mode.sh03-12n18
is a compute node defined with a service node (sh03-sn02
):When running
nodeset sh03-12n18 shell
, the correct configuration is generated on the management node (sh02-hn01
), but the generated config on the service node is wrong.Here's a demo. First we reset the tftpboot config files:
Now, after
nodeset shell
, the generated configuration is different on the SN and on the MN:When the node PXE boots, it get the DHCP lease from the SN first, and since
/tftpboot/xcat/nbk.x86_64
doesn't exists, boot fails and the node doesn't go in Genesis.Shouldn't the tftpboot configuration on the SN be identical to the on on the MN?
Thanks!