Open dombrowa opened 3 years ago
did u define master
in the site
table?
Can u run xcatprobe xcatmn -i <provision network interface>
?
Yes, master is defined:
[root@netsres-xcat ~]# tabdump site | grep -i master
"master","172.16.16.1",,
Output from xcatprobe:
[root@netsres-xcat ~]# xcatprobe xcatmn -i eth1
[mn]: Checking all xCAT daemons are running... [ OK ]
[mn]: Checking xcatd can receive command request... [ OK ]
[mn]: Checking 'site' table is configured... [ OK ]
[mn]: Checking provision network is configured... [ OK ]
[mn]: Checking 'passwd' table is configured... [ OK ]
[mn]: Checking important directories(installdir,tftpdir) are configured... [ OK ]
[mn]: Checking SELinux is disabled... [ OK ]
[mn]: Checking HTTP service is configured... [ OK ]
[mn]: Checking TFTP service is configured... [ OK ]
[mn]: Checking DNS service is configured... [ OK ]
[mn]: Checking DHCP service is configured... [ OK ]
[mn]: Checking NTP service is configured... [ OK ]
[mn]: Checking rsyslog service is configured... [ OK ]
[mn]: Checking firewall is disabled... [ OK ]
[mn]: Checking minimum disk space for xCAT ['/var' needs 1GB;'/install' needs 10GB;'/tmp' needs 1GB]... [ OK ]
[mn]: Checking Linux ulimits configuration... [ OK ]
[mn]: Checking network kernel parameter configuration... [ OK ]
[mn]: Checking xCAT daemon attributes configuration... [ OK ]
[mn]: Checking xCAT log is stored in /var/log/xcat/cluster.log... [WARN]
[mn]: Failed to store MN logs to /var/log/xcat/cluster.log
[mn]: Checking xCAT management node IP: <172.16.16.1> is configured to static... [ OK ]
[mn]: Checking dhcpd.leases file is less than 100M... [ OK ]
=================================== SUMMARY ====================================
[MN]: Checking on MN... [ OK ]
Checking xCAT log is stored in /var/log/xcat/cluster.log... [WARN]
Failed to store MN logs to /var/log/xcat/cluster.log
I would like to add, that any other previous RHEL distro still installs fine just not RHEL8.2.0 I noticed that the mypostscript.tmpl does not seem to appear being used when I run
rinstall <node> osimage=rhels8.2.0-x86_64-install-netsres
(precreatemypostscript is not enabled) During the post tasks I see that the section in mypostscript starting with (when installing RHEL7.7
AUDITNOSYSLOG='0'
export AUDITNOSYSLOG
XCATCONFDIR='/etc/xcat'
export XCATCONFDIR
TFTPDIR='/tftpboot'
export TFTPDIR
PPCMAXP='64'
export PPCMAXP
...
ending with
export SNMPPRIV
SNMPAUTH=''
export SNMPAUTH
# postscripts-start-here
# defaults-postscripts-start-here
syslog
remoteshell
syncfiles
# defaults-postscripts-end-here
# osimage-postscripts-start-here
custom/rhels7.7-x86_64-install-netsres/compute.postinstall
# osimage-postscripts-end-here
# node-postscripts-start-here
confignetwork
setroute
# node-postscripts-end-here
# postscripts-end-here
# postbootscripts-start-here
# osimage-postbootscripts-start-here
custom/rhels7.7-x86_64-install-netsres/compute.postboot
# osimage-postbootscripts-end-here
# node-postbootscripts-start-here
syncfiles
console-rev.sh
net-peer-disable.sh
# node-postbootscripts-end-here
# postbootscripts-end-here
Is not included when installing RHEL8.2 which explains why no post scripts are run, no ssh config no variables are known
Here is logic to determine the MASTER_IP
#the logic to determine the $ENV{XCATMASTER} confirm to the following priority(from high to low):
## 1, the "xcatmaster" attribute of the node
## 2, the ip address of the mn/sn facing the compute node
## 3, the site.master
check the node definition, is xcatmaster
defined?
or run the command: nslookup <nodename>
to make sure ip address can be resolved.
maybe you can show me the lsdef <nodename>
and tabdump networks
.
None of my nodes has xcatmaster defined:
netsres01: xcatmaster=
netsres02: xcatmaster=
netsres03: xcatmaster=
netsres04: xcatmaster=
netsres05: xcatmaster=
netsres06: xcatmaster=
netsres07: xcatmaster=
netsres08: xcatmaster=
netsres09: xcatmaster=
netsres10: xcatmaster=
netsres11: xcatmaster=
netsres12: xcatmaster=
netsres13: xcatmaster=
netsres14: xcatmaster=
netsres15: xcatmaster=
netsres16: xcatmaster=
netsres42: xcatmaster=
netsres42-vm1: xcatmaster=
netsres43: xcatmaster=
netsres44: xcatmaster=
netsres48: xcatmaster=
netsres49: xcatmaster=
netsres50: xcatmaster=
netsres51: xcatmaster=
netsres52: xcatmaster=
netsres54: xcatmaster=
netsres55: xcatmaster=
netsres56: xcatmaster=
netsres57: xcatmaster=
netsres58: xcatmaster=
netsres59: xcatmaster=
netsres60: xcatmaster=
netsres61: xcatmaster=
netsres62: xcatmaster=
netsres63: xcatmaster=
netsres74: xcatmaster=
netsres75: xcatmaster=
netsres76: xcatmaster=
netsres77: xcatmaster=
netsres78: xcatmaster=
netsres79: xcatmaster=
netsres80: xcatmaster=
netsres81: xcatmaster=
netsres82: xcatmaster=
netsres83: xcatmaster=
netsres84: xcatmaster=
netsres85: xcatmaster=
netsres86: xcatmaster=
Must I define this attribute?
What do you mean with "the ip address of the mn/sn facing the compute node"? I have one interface 172.16.16.0/20 for all nodes. xcatmaster is 172.16.16.1 and each node has route to it.
[root@netsres-xcat ~]# tabdump site | grep master "master","172.16.16.1",,
nodedef:
[root@netsres-xcat ~]# lsdef netsres46
Object name: netsres46
addkcmdline=inst.sshd kernel.watchdog_thresh=30
arch=x86_64
cons=ipmi
currchain=boot
currstate=install rhels8.2.0-x86_64-netsres
groups=all,vm
ip=172.16.17.46
mac=52:54:00:4b:2e:38
mgt=kvm
netboot=xnba
nicdevices.br_blue=ens4
nicdevices.br_green=ens3
nichostnamesuffixes.br_blue=-blu
nichostnamesuffixes.br_green=-gre
nicips.ens3=172.16.17.46
nicips.br_blue=9.2.156.70
nicips.br_green=172.16.17.46
nicnetworks.br_blue=blue
nicnetworks.br_green=green
nicnetworks.enp1s0f0=green
nictypes.br_blue=bridge
nictypes.ens3=ethernet
nictypes.ens4=ethernet
nictypes.enp1s0f0=ethernet
nictypes.br_green=bridge
os=rhels8.2.0
postbootscripts=syncfiles,console-rev.sh,net-peer-disable.sh
postscripts=syslog,remoteshell,syncfiles,confignetwork,setroute
power=ipmi
profile=netsres
provmethod=rhels8.2.0-x86_64-install-netsres
routenames=pubrt,greenrt
serialport=0
serialspeed=115200
status=installing
statustime=09-28-2020 16:14:07
updatestatus=failed
updatestatustime=09-26-2020 20:23:09
Networks table:
[root@netsres-xcat ~]# tabdump networks
#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,staticrange,staticrangeincrement,nodehostname,ddnsdomain,vlanid,domain,mtu,comments,disable
"blue","9.2.156.64","255.255.255.192","eth2","9.2.156.65",,"9.2.156.71","9.2.250.86",,,,,,,,,,"1500",,
"green","172.16.16.0","255.255.240.0","eth1","<xcatmaster>",,"172.16.16.1",,,,"172.16.28.1-172.16.31.254",,,,,,,"1500",,
"nickel","172.16.80.0","255.255.240.0","lo0","172.16.80.1",,,,,,,,,,,,,"9000",,
"purple","172.16.32.0","255.255.240.0","eth3","172.16.32.1",,"172.16.32.1",,,,"172.16.44.1-172.16.47.254",,,,,,,"1500",,
"red","172.16.0.0","255.255.240.0","eth0","172.16.0.1",,"172.16.0.1",,,,"172.16.12.1-172.16.15.254",,,,,,,"1500",,
"silver","172.16.96.0","255.255.240.0","lo0","172.16.96.1",,,,,,,,,,,,,"9000",,
"zinc","172.16.48.0","255.255.240.0","lo0","172.16.48.1",,,,,,,,,,,,,"9000",,
"cadmium","172.16.176.0","255.255.240.0","lo0","172.16.176.1",,,,,,,,,,,,,"9000",,
"copper","172.16.144.0","255.255.240.0","lo0","172.16.144.1",,,,,,,,,,,,,"9000",,
"chromium","172.16.160.0","255.255.240.0","lo0","172.16.160.1",,,,,,,,,,,,,"9000",,
"titanium","172.16.192.0","255.255.240.0","lo0","172.16.192.1",,,,,,,,,,,,,"9000",,
"tungsten","172.16.208.0","255.255.240.0","lo0","172.16.208.1",,,,,,,,,,,,,"9000",,
"tantalum","172.16.224.0","255.255.240.0","lo0","172.16.224.1",,,,,,,,,,,,,"9000",,
"gold","172.16.240.0","255.255.240.0","lo0","172.16.240.1",,,,,,,,,,,,,"9000",,
"platinum","172.17.16.0","255.255.240.0","lo0","172.17.16.1",,,,,,,,,,,,,"9000",,
"mercury","172.17.32.0","255.255.240.0","lo0","172.17.32.1",,,,,,,,,,,,,"9000",,
"iridium","172.17.0.0","255.255.240.0","lo0","172.17.0.1",,,,,,,,,,,,,"9000",,
"iron","172.16.64.0","255.255.240.0","lo0","172.16.64.1",,,,,,,,,,,,,"9000",,
"cobalt","172.16.112.0","255.255.240.0","lo0","172.16.112.1",,,,,,,,,,,,,"9000",,
"manganese","172.16.128.0","255.255.240.0","lo0","172.16.128.1",,,,,,,,,,,,,"9000",,
"554","9.2.154.128","255.255.255.192","eth2","9.2.154.130",,"9.2.154.140","9.2.250.86",,,,,,,,,,"1500",,
"192_168_122_0-255_255_255_0","192.168.122.0","255.255.255.0","virbr0","<xcatmaster>",,"<xcatmaster>",,,,,,,,,,,"1500",,
Why would RHEl 7.7 install properly with all vars and postboot/postscripts included after firstboot in /xcatpost/mypostscript but not RHEL8*
can u show me the lsdef -t osimage rhels8.2.0-x86_64-install-netsres
?
[root@netsres-xcat ~]# lsdef -t osimage rhels8.2.0-x86_64-install-netsres
Object name: rhels8.2.0-x86_64-install-netsres
imagetype=linux
osarch=x86_64
osdistroname=rhels8.2.0-x86_64
osname=Linux
osvers=rhels8.2.0
otherpkglist=/install/custom/rhels8.2.0-x86_64-install-netsres/pkglist-other
pkgdir=/install/rhels8.2.0/x86_64
pkglist=/install/custom/rhels8.2.0-x86_64-install-netsres/pkglist
postbootscripts=custom/rhels8.2.0-x86_64-install-netsres/compute.postboot
postscripts=custom/rhels8.2.0-x86_64-install-netsres/compute.postinstall
profile=netsres
provmethod=install
synclists=/install/custom/rhels8.2.0-x86_64-install-netsres/synclist
template=/install/custom/rhels8.2.0-x86_64-install-netsres/compute.rhels8.tmpl
everything looks fine to me. If post.xcat.ng
doesn't have MASTER_IP
set, the /opt/xcat/share/xcat/install/scripts/pre.rhels8
should not have neither.
Can u check /install/autoinstall/<nodename>
file? it will created after rinstall
command, the MASTER_IP
should be there already.
Also, after issue rinstall
command, run xcatprobe osdeploy -n <nodename>
,
The file /install/autoins/
[root@netsres-xcat ~]# grep MASTER_IP /install/autoinst/netsres46|head
export MASTER_IP="172.16.16.1"
msgutil_r "$MASTER_IP" "info" "============deployment starting============" "/var/log/xcat/xcat.log" "$log_label"
msgutil_r "$MASTER_IP" "info" "Running Anaconda Pre-Installation script..." "/var/log/xcat/xcat.log" "$log_label"
msgutil_r "$MASTER_IP" "info" "Detecting install disk..." "/var/log/xcat/xcat.log" "$log_label"
msgutil_r "$MASTER_IP" "info" "Found $instdisk, generate partition file..." "/var/log/xcat/xcat.log" "$log_label"
msgutil_r "$MASTER_IP" "info" "Generate the repository for the installation" "/var/log/xcat/xcat.log" "$log_label"
I am captuing xcatprobe osdeploy -n
I notice that the curl command below does not find the mypostscript.
[root@netsres-xcat ~]# lsdef -t site -i precreatemypostscripts
Object name: clustersite
precreatemypostscripts=0
curl --fail --retry 20 --max-time 60 "http://$MASTER_IP:${HTTPPORT}$TFTPDIR/mypostscripts/mypostscript.$NODE" -o "/xcatpost/\
mypostscript.$NODE" 2> /tmp/download.log
Error shows as:
[root@netsres46 ~]# cat /tmp/download.log
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (22) The requested URL returned error: 404 Not Found
The mypostscrip.post after first reboot shows that the sections with export VARS and the postscripts label section which appear in other nodes when installing RHEL7.7:
# postscripts-start-here
......
# postscripts-end-here
are both missing. Therefore no postscripts were executed
This is what I see on the node
[root@netsres46 ~]# cat /xcatpost/mypostscript.post
. /xcatpost/xcatlib.sh
# global value to store the running status of the postbootscripts,the value is non-zero if one postbootscript failed
return_value=0
# subroutine used to run postscripts
# $1 argument is the script type
# rest argument is the script name and arguments
run_ps () {
local ret_local=0
mkdir -p "/var/log/xcat"
# On some Linux distro, the rsyslogd daemon write log files with permision
# other than root:root. And in some case, the directory /var/log/xcat was
# created by xCAT, and had root:root ownership. In this way, rsyslogd
# did not have enough permission to write to log files under this directory.
# As a dirty hack, change the ownership of directory /var/log/xcat to the
# same ownership of directory /var/log.
chown root:root "/var/log/xcat"
local logfile="/var/log/xcat/xcat.log"
local scriptype=$1
shift;
if [ -z "$scriptype" ]; then
scriptype="postscript"
fi
log_label="xcat.deployment."$scriptype
if [ -f $1 ]; then
msgutil_r "$MASTER_IP" "info" "Running $scriptype: $1" "$logfile" "$log_label"
if [ "$XCATDEBUGMODE" = "1" ] || [ "$XCATDEBUGMODE" = "2" ]; then
local compt=$(file $1)
local reg="shell script"
if [[ "$compt" =~ $reg ]]; then
bash -x ./$@ 2>&1
ret_local=$?
else
./$@ 2>&1 | logger -t xcat -p debug
ret_local=${PIPESTATUS[0]}
fi
else
./$@ 2>&1
ret_local=${PIPESTATUS[0]}
fi
if [ "$ret_local" -ne "0" ]; then
return_value=$ret_local
fi
msgutil_r "$MASTER_IP" "info" "$scriptype $1 return with $ret_local" "$logfile" "$log_label"
else
msgutil_r "$MASTER_IP" "error" "$scriptype $1 does NOT exist." "$logfile" "$log_label"
return_value=-1
fi
return 0
}
# subroutine end
echo xcat.deployment [xcatinstallpost] mypostscript.post MASTER_IP=$MASTER_IP XCATDEBUGMODE=0 MASTER=$MASTER >> /root/post.xcat.log
[ -f /opt/xcat/xcatinfo ] && grep 'POSTSCRIPTS_RC=1' /opt/xcat/xcatinfo >/dev/null 2>&1 && return_value=1
env > /root/env.mypostscript.post
set -x
if [ "$return_value" -eq "0" ]; then
if [ "$XCATDEBUGMODE" = "1" ] || [ "$XCATDEBUGMODE" = "2" ]; then
msgutil_r "$MASTER_IP" "debug" "node booted, reporting status..." "/var/log/xcat/xcat.log" "$log_label"
fi
updateflag.awk $MASTER 3002 "installstatus booted"
rc=$?
echo "xcat.deployment [xcatinstallpost] mypostscript.post updateflag.awk $MASTER 3002 \"installstatus booted\" return with $rc" >> /root/post.xcat.log
msgutil_r $MASTER_IP "info" "provision completed.($NODE)" "/var/log/xcat/xcat.log" "$log_label"
else
if [ "$XCATDEBUGMODE" = "1" ] || [ "$XCATDEBUGMODE" = "2" ]; then
msgutil_r "$MASTER_IP" "debug" "node boot failed, reporting status..." "/var/log/xcat/xcat.log" "$log_label"
fi
updateflag.awk $MASTER 3002 "installstatus failed"
rc=$?
echo "xcat.deployment [xcatinstallpost] mypostscript.post updateflag.awk $MASTER 3002 \"installstatus failed\" return with $rc" >> /root/post.xcat.log
msgutil_r $MASTER_IP "error" "provision completed with error.($NODE)" "/var/log/xcat/xcat.log" "$log_label"
fi
@dombrowa , sorry, was typo, the file /install/autoinst/<nodename>
is created via nodeset/rinstall
command. It contains deployment flow for this node. The MASTER_IP
was available. For the postscripts defined in the osimage, I think they should have /install
in the front of custom
, right?
postbootscripts=custom/rhels8.2.0-x86_64-install-netsres/compute.postboot
postscripts=custom/rhels8.2.0-x86_64-install-netsres/compute.postinstall
if precreatemypostscripts
is set to yes/1, it will regenerate the /tftpboot/mypostscript/mypostscript.<nodename>
. Normally, we didn't set, the curl will fail if download
curl --fail --retry 20 --max-time 60 "http://$MASTER_IP:${HTTPPORT}$TFTPDIR/mypostscripts/mypostscript.$NODE" -o "/xcatpost/\
mypostscript.$NODE" 2> /tmp/download.log
but it will go on, and use /xcatpost/getpostscript.awk
to download the postscripts.
so, check the file /install/autoinst/<nodename>
to see if MASTER
is unset somewhere. and run xcatprobe osdeploy -n <nodename>
after rinstall
command
Never mind the typo autoinst[all]. It was obvious as I have other xCAT Management nodes to compare to.
As to the curl and awk download: I have added various taps into the code and have observed:
[root@netsres-xcat work]# find /opt/xcat/ -iname "*.pm" -or -iname '*.pl' -exec grep mypostscript {} \;
The autoinst file does contain the MASTER_IP when I run rinstall RHEL8
[root@netsres-xcat ~]# grep MASTER /install/autoinst/netsres46
export MASTER_IP="172.16.16.1"
The problem remains that even with MASTER_IP, MASTER etc. set the mypostscript.post is missing the export statments and the section to run postboot/postscripts.
awk will always create the file no matter what due to the '>'
/xcatpost/getpostscript.awk | egrep '<data>' | sed -e 's/<[^>]*>//g' | egrep -v '^ *$' | sed -e 's/^ *//' | sed -e 's/&l\
t;/</g' -e 's/>/>/g' -e 's/&/\&/g' -e 's/"/"/g' -e "s/'/'/g" >/xcatpost/mypostscript
Instead post.xcat.ng greps for MASTER= in /xcatpost/mypostscript.netsres46 which it never finds in all 10 iterations I added some code which logs this behavior below: curl failed, and awk tries 10x to download
xcat.deployment [post.xcat.ng] curl --fail --retry 20 --max-time 60 "http://172.16.16.1:80/tftpboot/mypostscripts/mypostscript.netsres46" -o "/xcatpost/mypostscript.netsres46" 2> /tmp/download.log return with 22
xcat.deployment [post.xcat.ng] precreated mypostscript not downloaded, see /tmp/download.log
xcat.deployment [post.xcat.ng] no pre-generated mypostscript.<nodename>, trying to get it with getpostscript.awk...
xcat.deployment [post.xcat.ng] /xcatpost/getpostscript.awk .. return with 0
xcat.deployment [post.xcat.ng] [1/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] [2/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] [3/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] [4/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] [5/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] [6/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] [7/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] [8/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] [9/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] Missing MASTER in /xcatpost/mypostscript
xcat.deployment [post.xcat.ng] Missing tag "postscripts-start-here" in /xcatpost/mypostscript
xcat.deployment [post.xcat.ng] Missing tag "postscripts-end-here" in /xcatpost/mypostscript
xcat.deployment [post.xcat.ng] Missing tag "postbootscript-start-here" in /xcatpost/mypostscript
xcat.deployment [post.xcat.ng] Missing tag "postbootscript-end-here" in /xcatpost/mypostscript
xcat.deployment [post.xcat.ng] generate mypostscript.post file successfully
I will run another install using the full path for the post*
postbootscripts=/install/postscripts/custom/rhels8.2.0-x86_64-install-netsres/compute.postboot
postscripts=/install/postscripts/custom/rhels8.2.0-x86_64-install-netsres/compute.postinstall
/xcatpost/getpostscript.awk
will call /opt/xcat/lib/xcat/plugins/getpostscript.pm
, then call makescript
in the file /opt/xcat/lib/perl/xCAT/Postage.pm
can u check the error message of makescript
in the /var/log/xcat/*log
?
In the site table , there is no precreatemypostscripts
attributes, right?
do u have /install/postscripts/mypostscript.tmpl
file available on your system? if you do, can u get rid of it and set precreatemypostscripts
attribute to 0 in the site
table.
MN=Management Node (in my case the MASTER or xCAT server, all as one node)
/opt/xcat/lib/xcat/plugins/getpostscript.pm does not exist on the MN or any of my cluster nodes but as /opt/xcat/lib/perl/xCAT_plugin/getpostscript.pm on the MN
/install/postscripts/mypostscript.tmpl exists on my MN as /opt/xcat/share/xcat/mypostscript/mypostscript.tmpl and
[root@netsres-xcat ~]# lsdef -t site -i precreatemypostscripts
Object name: clustersite
precreatemypostscripts=0
which should satisfy your requirements. This has not been changed.
With these settings the osdeploy log shows:
[root@netsres-xcat Downloads]# xcatprobe osdeploy -n netsres46 2>&1| tee ~/work/netsres46.osdeploy.2
....
[netsres46] 12:49:02 Via HTTP get /install/postscripts/xcatserver
[netsres46] 12:49:02 Via HTTP get /tftpboot/mypostscripts/mypostscript....
[netsres46] 12:51:32 Via HTTP get /tftpboot/xcat/xnba/nodes/netsres46
[netsres46] 13:02:43 Via HTTP get //install/rhels8.2.0/x86_64/AppStream...
[netsres46] 13:02:43 Via HTTP get //install/rhels8.2.0/x86_64/AppStream...
[netsres46] 13:02:43 Via HTTP get //install/rhels8.2.0/x86_64/AppStream...
[netsres46] 13:02:43 Via HTTP get //install/rhels8.2.0/x86_64/AppStream...
[netsres46] 13:02:43 Via HTTP get //install/rhels8.2.0/x86_64/AppStream...
[netsres46] 13:02:43 Via HTTP get //install/rhels8.2.0/x86_64/AppStream...
[netsres46] 13:02:44 Via HTTP get //install/rhels8.2.0/x86_64/BaseOS/re...
[netsres46] 13:02:44 Via HTTP get //install/rhels8.2.0/x86_64/BaseOS/re...
[netsres46] 13:02:44 Via HTTP get //install/rhels8.2.0/x86_64/BaseOS/re...
[netsres46] 13:02:44 Via HTTP get //install/rhels8.2.0/x86_64/BaseOS/re...
[netsres46] 13:02:44 Via HTTP get //install/rhels8.2.0/x86_64/BaseOS/re...
60 minutes have expired, stop monitoring [INFO]
====================== Summary =====================
There is 1 node provision failures
netsres46 : stop at stage 'start_to_install_os_package' [FAIL]
and syslog shows:
...
Sep 30 16:32:23 netsres46 xcat.deployment Generate the repository for the installation
Sep 30 12:37:53 netsres46 xcat.deployment [post.xcat.ng] Executing post.xcat to prepare for firstbooting ...
Sep 30 12:38:33 netsres46 xcat.deployment [post.xcat.ng] trying to download postscripts from 172.16.16.1...
Sep 30 12:38:35 netsres46 xcat.deployment [post.xcat.ng] postscripts downloaded successfully
Sep 30 12:38:35 netsres46 xcat.deployment [post.xcat.ng] trying to get mypostscript from 172.16.16.1...
Sep 30 12:38:35 netsres46 xcat.deployment [post.xcat.ng] failed to download precreated mypostscript
Sep 30 12:40:53 netsres46 xcat.deployment [post.xcat.ng] finished firstboot preparation, sending request to 172.16.16.1:3002 for changing status...
Sep 30 12:41:57 netsres46 xcat.deployment [xcatinstallpost] Running /xcatpost/mypostscript.post
Sep 30 12:41:57 netsres46 xcat provision completed.(netsres46)
Sep 30 12:41:57 netsres46 xcat.deployment [xcatinstallpost] /xcatpost/mypostscript.post return
Sep 30 12:41:57 netsres46 xcat.deployment [xcatinstallpost] =============deployment ending====================
sorry, it is /opt/xcat/lib/perl/xCAT_plugin/getpostscript.pm
and /opt/xcat/lib/perl/xCAT/Postage.pm
what's in the /install/postscripts/mypostscript.tmpl
? this file is created if precreatemypostscripts=1
, what's the timestamp?
Can you remove the file /install/postscripts/mypostscript.tmpl
then run rinstall
again?
The timestamp between xcatprobe
command and syslog
is different, and syslog showed deployment ending
, but osdeploy
stuck on the installation of packages?
I see that the getpostscript.awk submits <command>getpostscript</command>"
upon (I assume) the xcat server runs
/opt/xcat/lib/perl/xCAT_plugin/getpostscript.pm
As to the syntax for postscripts: When I run with full path in osimage table for post*scripts, I see this error
netsres46: Wed Sep 30 14:22:39 EDT 2020 postscript /install/postscripts/custom/rhels8.2.0-x86_64-install-netsres/compute.postboot does NOT exist.
Since this message is prefixed with <nodename>
I believe the full path is incorrect and should remain relative
custom/rhels8.2.0-x86_64-install-netsres/compute.postboot
e.g. to /xcatpost on the node
3 . Setting 'precreatemypostscripts=1and the running rinstall this file appears:
/tftpboot/mypostscripts/mypostscript.netsres46`
Here its contents:
mypostscript.netsres46.gz
but not this anymore:
[root@netsres-xcat ~]# ls /install/postscripts/mypostscript.tmpl
ls: cannot access /install/postscripts/mypostscript.tmpl: No such file or directory
When I switch back to precreatemypostscripts=0
this file /tftpboot/mypostscripts/mypostscript.netsres46
disappears
So it is not clear what and when this file /install/postscripts/mypostscript.tmpl
I will have to check regarding the timestamp as both should be from the MN, correct and in sync regardless if the node has a time offset due to incorrect ntp?
Both MASTER
and MASTER_IP
are defined in the mypostscript.netsres46.gz
.
I think from previous post the MASTER
is also available in the install/autoinst/netsres46
are there some postscripts unset
the ENV?
@dombrowa , what's the OS for xCATmn?
The xcat MN is a VM running
cat[root@netsres-xcat ~]# cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.9 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.9"
PRETTY_NAME="OpenShift Enterprise"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.9:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.9
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.9"
FYI, I have encountered two items that cause similar behavior: -Ubuntu install - If the installed image has non-gawk awk, then it turns out like this -If site table contains xcatsslversion or xcatsslciphers that disable newer ciphers by mistake, this happens
Note that I delete the use of 'nice' as randbytes, as it is a bad idea.
In this case, running 'getpostscript.awk' is the most direct way of seeing what is going awry.
Thanks @jjohnson42, I had the same problem with xCAT 2.16.1 when deploying CentOS 8.2 nodes, no variables were defined in /xcatpost/mypostscript including $MASTER_IP, and simply removing the xcatsslversion definition from site table fixed the problem.
the xcatinstall postscript is not filling in the vars MASTER_IP and MASTER when provisioning RHEL8 which starts with post.xcat.ng instead of post.xcat (compared to RHEL7) MASTER_IP is e.g. required for the logger inside msgutil_r MASTER is required for the updateflag.awk script
I added this code to find out if the value is available and could be added: Yes.
Running with the above code (and more) added will trigger this output
This code added to xcatinstallpost shows that the value for MASTER is missing as well:
Running with the above code (and more) added will trigger this output