Open urielrosen1981 opened 2 years ago
@urielrosen1981 Have you tried installing this image over non-tagged VLAN ?
Hi,
Thanks for your reply , I tested installing anther node over non-tagged VLAN now and it was successful , can you help me debug and solve this issue ?
xNBA is based on iPXE. xCAT does not provide any special handling to enable iPXE VLAN features.
I think there are two problems here: 1.) xCAT does not provide support for tagged vlans in xNBA 2.) There is an open issue in iPXE related to tagged VLANs: https://github.com/ipxe/ipxe/issues/369
I think using an untagged VLAN is probably the easiest solution. Is that an option for your use case?
Hi,
Sorry but our network design currently is only using VLAN tagging , is there any workaround you know I can use , alternatively , do you have any estimate when this bug will be fixed ?
do you have any estimate when this bug will be fixed ?
xNBA vlan tagging is not a priority for the xCAT core team, so any improvements related to this request will need to be driven by community members such as yourself.
is there any workaround you know I can use
Possible workarounds you could attempt:
1.) You could try to manually modify the xNBA file that contains the boot commands located at /tftpboot/xcat/xnba/nodes/ai-slurm-g9
on your management node to add the necessary iPXE commands to create the tagged vlan. A simple example is described here: https://ipxe.org/scripting, I think the vcreate
command is what you need.
2.) You could try to replace the current version of xnba-undi
installed on your management node with the older 1.0.3 version available here: https://xcat.org/files/xcat/repos/yum/2.16/xcat-dep/xnba-undi-1.0.3-131028.noarch.rpm to see if it behaves the same way or not. I think this experiment is worth trying, but I am not sure if it will solve your problem.
3.) You may need to combine 1 and 2 to add the call to vcreate
and get a version of xNBA that is not impacted by https://github.com/ipxe/ipxe/issues/369.
4.) You could try to build a custom version of xNBA that includes a custom script to configure the vlan for your environment.
If you can can report the results of your investigation here, we can try to continue to assist with suggestions.
Instructions for step 4) can be found here: https://github.com/xcat2/xcat-dep/blob/master/xnba/README
Thanks for your suggestions , I just tried steps 1 and 2 but I have some questions . tried to modify the /tftpboot/xcat/xnba/nodes/ai-slurm-g9 file but I see that after each "rinstall" when I try to deploy the image the file is rewritten to the default file , is this behavior normal and should I just add my custom commands each time after I run "rinstall" ? anyway , I attached the output of the file bellow , so far it didn't work trying to create my tagged VLAN , please tell me if I have done this correctly or I need to modify the file, I need to create a VLAN 36 and use DHCP to start the installation. another issue now , after reaching net 1 I lose the display and cannot see anything on the console monitor. I wanted to try to debug this by entering the XNBA shell using "ctrl + b" but that didn't work also. waiting for your input. thanks.
vcreate --tag 36 net0 autoboot net0-36 imgfetch -n kernel http://${next-server}:80/tftpboot/xcat/osimage/rocky8-x86_64-install-compute/vmlinuz imgload kernel imgargs kernel quiet inst.repo=http://10.26.36.80:80/install/rocky8/x86_64 inst.ks=http://10.26.36.80:80/install/autoinst/ai-slurm-g9 ip=ens1f0:dhcp inst.sshd inst.loglevel=debug inst.syslog=10.26.36.80 BOOTIF=01-${netX/machyp} imgfetch -n initrd http://${next-server}:80/tftpboot/xcat/osimage/rocky8-x86_64-install-compute/initrd.img imgexec kernel
tried to modify the /tftpboot/xcat/xnba/nodes/ai-slurm-g9 file but I see that after each "rinstall" when I try to deploy the image the file is rewritten to the default file , is this behavior normal and should I just add my custom commands each time after I run "rinstall" ?
Everytime rinstall
or nodeset
is run, the tftpboot files will be regenerated, this is normal behavior. rinstall
is a convenience command that combines a few other commands together into a single operation. For the test you are attempting, I would recommend using nodeset
instead of rinstall
so you can modify the boot file after the nodeset
, but before the install starts. Some more information here: https://xcat-docs.readthedocs.io/en/stable/guides/admin-guides/manage_clusters/ppc64le/diskful/deploy_os.html?highlight=nodeset
Process should be something like:
nodeset ai-slurm-g9 osimage=rocky8-x86_64-install-compute
Modify /tftpboot/xcat/xnba/nodes/ai-slurm-g9
rsetboot ai-slurm-g9 net
rpower ai-slurm-g9 reset
please tell me if I have done this correctly or I need to modify the file
I don't have any existing experience trying to boot iPXE/xNBA over tagged VLAN, so I don't have any specific advice on the actual commands. I was re-reading this issue: https://github.com/ipxe/ipxe/issues/369 and I noticed another problem:
In your original screenshot above, there is a "Features" line that shows which iPXE features have been compiled into xNBA/iPXE. xNBA does not have the VLAN
feature listed, so it most likely does not support the vcreate
command.
You will need to rebuild xNBA using the instructions @gurevichmark pointed to above, but enable the VLAN feature using VLAN_CMD
as described in the iPXE issue. However, you will also need to patch the code to avoid the problem described in the iPXE issue.
Thanks for your reply,
I tried rebuilding ipxe with VLAN commands but have some difficulty and would appreciate your advice, first I will describe the steps I took and the error I encountered . git clone https://git.ipxe.org/ipxe.git
mv ipxe xnba-1.21.1
in xnba-1.21.1/src/config/general.h file I added this line for VLAN support:
ran cd src; make
copied 5 patch files and xnba-1.21.1.tar.bz2 file to /root/rpmbuild/SOURCES/ then ran rpmbuild -ba xnba-undi.spec to rebuild the xnba rpm and got the below error.
-rw-r--r-- root/root 15247 2022-05-08 12:03 xnba-1.21.1/src/util/zbin.c -rwxr-xr-x root/root 38040 2022-05-08 12:52 xnba-1.21.1/src/util/zbin -rw-r--r-- root/root 0 2022-05-08 12:51 xnba-1.21.1/src/.echocheck
Do you know why I got this error ? is there a working rpm you can provide for me to download with VLAN support perhaps?
@urielrosen1981 What OS are you running rpmbuild -ba xnba-undi.spec
command on ?
One thing you can try is add this line to xnba-undi.spec
somewhere before %define
lines there:
%global _default_patch_fuzz 3
thanks , now it ran for a couple of minutes before failing , error is below.
(.text16.data+0x76): undefined reference to _data16_memsz' bin-x86_64-efi/blib.a(pxe_entry.o): In function
pxenv':
(.text16.data+0x82): undefined reference to _data16_memsz' bin-x86_64-efi/blib.a(pxe_entry.o): In function
pxenv':
(.text16.data+0x86): undefined reference to `_text16_memsz'
make: *** [bin-x86_64-efi/snponly.efi.tmp] Error 1
rm bin-x86_64-efi/version.snponly.efi.o
error: Bad exit status from /var/tmp/rpm-tmp.gHOO37 (%build)
RPM build errors: Bad exit status from /var/tmp/rpm-tmp.gHOO37 (%build)
@urielrosen1981
What OS are you running on ?
Have you tried without your changes to xnba-1.21.1/src/config/general.h
file ?
CentOS Linux release 7.9.2009 (Core)
Did not modify xnba-1.21.1/src/config/general.h file.
Oh, I thought earlier you posted:
in xnba-1.21.1/src/config/general.h file I added this line for VLAN support:
define VLAN_CMD / VLAN commands /
Yes , you are right I forgot about this , anyway, I was able to build the rpm and install it but now I get the below error , do you have any idea what is wrong now?
@urielrosen1981
Verify that your management server has the /tftpboot/xcat/xnba.efi
file matching the time you built it with rpmbuild
command and with "read for all" permissions.
The file has read for all
ls -ltr /tftpboot/xcat/xnba.efi -rw-r--r-- 1 root root 139200 Oct 28 2013 /tftpboot/xcat/xnba.efi
not sure I understand what you mean "file matching the time you built it with rpmbuild command"
could you please explain how to check this?
The rpmbuild -ba xnba-undi.spec
command should have generated a new xnba-undi-1.21.1-1.noarch
RPM.
That RPM should contain updated files:
[root@c910f04x40 ~]# rpm -qll xnba-undi-1.21.1-1.noarch
/tftpboot/xcat/xnba.efi
/tftpboot/xcat/xnba.kpxe
[root@c910f04x40 ~]#
Uninstalling your existing xnba-undi
and then installing the new one, should have replaced those 2 files under /tftpboot/xcat
It looks like your files are from 2013, so maybe you have not installed the xnba-undi
RPM generated by rpmbuild
?
I think you are correct , the install didn't finish correctly , how do you suggest to overwrite the new rpm over the existing one to work in my system ?
Try rpm -U
on the xnba-undi
rpm file generated by rpmbuild
command.
rpm -U xnba-undi-1.21.1-1.noarch.rpm package xnba-undi-1.21.1-1.noarch is already installed file /tftpboot/xcat/xnba.efi from install of xnba-undi-1.21.1-1.noarch conflicts with file from package xnba-undi-1.21.1-1.noarch file /tftpboot/xcat/xnba.kpxe from install of xnba-undi-1.21.1-1.noarch conflicts with file from package xnba-undi-1.21.1-1.noarch
I get this conflict tried to move these 1files aside but that didn't help .
Try to remove existing rpm with rpm -e
and then install the new one.
I cannot because xCAT-2.16.3-snap202111100958.x86_64 depands on it xnba-undi-1.21.1-1.noarch.rpm
if I also remove xCAT-2.16.3-snap202111100958.x86_64 will this not harm the entire installation of xCAT?
How about rpm -U --replacefiles --replacepkgs xnba-undi-1.21.1-1.noarch.rpm
?
If that fails, you can bump up to 1.21.2
the version number in xnba-undi.spec
, rebuild the rpm with rpmbuild
command again. That should generate xnba-undi-1.21.2-1.noarch.rpm
and you can try to install it
with rpm -U xnba-undi-1.21.2-1.noarch.rpm
Thanks , this worked so I was able to run the vcreate command but got an error that was mentioned in ipxe git error you pointed me to. I asked them how to get the commit which solves this error unless you know how to do this and can share this with me.
I was able to find and install the correct ipxe version with vlan support(https://github.com/ipxe/ipxe/commit/eecb75ba) Thanks a lot !!!. I am now faced a few more issues :
1 .I am only able to continue the boot from xnba prompt , when I enter the same commands that worked in the shell in the node script it doesn't work 👍
cat /tftpboot/xcat/xnba/nodes/ai-slurm-g9.uefi
vcreate --tag 36 -p 0 net0 autoboot imgfetch -n kernel http://${next-server}:80/tftpboot/xcat/osimage/rocky8-x86_64-install-compute/vmlinuz imgload kernel imgargs kernel quiet inst.repo=http://10.26.36.80:80/install/rocky8/x86_64 inst.ks=http://10.26.36.80:80/install/autoinst/ai-slurm-g9 ip=ens1f0:dhcp inst.sshd inst.loglevel=debug inst.syslog=10.26.36.80 BOOTIF=01-${netX/mac:hexhyp} initrd=initrd imgfetch -n initrd http://${next-server}:80/tftpboot/xcat/osimage/rocky8-x86_64-install-compute/initrd.img imgexec kernel
what am I doing wrong here?
any suggestions ?
@urielrosen1981
#!gpxe
to #!ipxe
in your /tftpboot/xcat/xnba/nodes/ai-slurm-g9.uefi
file before running rpower
?Started cancel waiting...
? And hitting Enter a few times does not advance the display ?Hi,
changing to #!ipxe didn't make a change here, is there a way to debug this outside of xnba prompt? regarding the screenshot , it doesn't advance past this with enter , I am guessing the last line is the reason it hangs but not sure.
Perhaps you can try posting to xcat-user
mailing list to see if anyone in the community had success with tagged VLAN on x86 ?
Thanks for your suggestion , I sent the details to the mailing list, hope to find a solution. Thanks again for all your kind help.
Hello,
I am unsuccessful in installing new osimage over tagged vlan which was configured on the bios of the server , pxe boot starts and then I get a message that there are no more network devices. I attached the node defenitions . please advice if you see a missing or wrong setting.
lsdef ai-slurm-g9 Object name: ai-slurm-g9 arch=x86_64 bmc=10.26.19.209 bmcpassword=admin bmcport=0 bmcusername=***** cmdmapping=115200 cons=ipmi consoleenabled=1 consoleondemand=hard currchain=boot currstate=install rocky8-x86_64-compute getmac=1 installnic=eth0 ip=10.26.36.109 mac=04:3f:72:db:77:44 mgt=ipmi netboot=xnba nfsserver=10.26.36.80 nicdevices.bond0-port1=bond0 nicdevices.bond0-port2=bond0 nicips.bond0-port1=10.26.36.109 nictypes.bond0-port1=vlan nictypes.ens7f0=ethernet nictypes.bond0-port2=vlan nictypes.ens1f0=ethernet os=rocky8 postbootscripts=otherpkgs postscripts=syslog,remoteshell,syncfiles primarynic=eth0 profile=compute provmethod=rocky8.5 serialflow=115200 serialspeed=1 status=powering-on statustime=05-01-2022 10:29:15 tftpserver=10.26.36.80 updatestatus=synced updatestatustime=12-23-2020 16:21:29