xcat2 / xcat-core

Code repo for xCAT core packages
Eclipse Public License 1.0
359 stars 171 forks source link

Error during centos 8 installation #6894

Open marseaplage opened 3 years ago

marseaplage commented 3 years ago

Hello guys.

Please help me to fix to figure out why the xcat deployment fails in this blade server, I am provisioning a centos 8.2 image with xcat, in another server it install successful but in this one I got this message, in spite of both show the errors related to floppy disk y sha256_mb

1 2

ANy ideas I would really appreciate them

soportemodemat commented 3 years ago

Please help me with this topic, Do you have any ideas?

besawn commented 3 years ago

Please provide the following information and perhaps another CentOS user with a similar hardware configuration will be able to assist you:

lsxcatd -v
xcatprobe xcatmn
lsdef -t osimage <OSIMAGE_YOU_ARE_INSTALLING>
lsdef <NODE_THAT_IS_SUCCESSFUL>
lsdef <NODE_THAT_IS_FAILING>
xcatprobe osdeploy -n <NODE_THAT_YOU_ARE_INSTALLING>

@soportemodemat If you are not working directly with @marseaplage, consider opening a second issue as there could be vast differences between your environments with different root causes.

soportemodemat commented 3 years ago

Hello @besawn thank you for your reply. Yes we are working together in this case.

This server is HP ProLiant BL460c G7 The output is this one for all the command that you have indicated:

lsxcatd -v

lsxcatd

xcatprobe xcatmn

xcatmode

lsdef -t osimage

lsdef1

lsdef

lsdef_qnd2-8

lsdef

lsdef_qnd1-2

Exact procedure you are using to install the node with captured output.

makehosts makenetworks makedhcp -n makedns -n

rsetboot quinde-1-2 net rpower quinde-1-2 reset

proced1 proced2

xcatprobe osdeploy -n

osdeploy

cxhong commented 3 years ago

from the first screen shot, I think I saw firmware bug, maybe you can compare the firmware level for two nodes?
rflash <nodename> -c

soportemodemat commented 3 years ago

@cxhong when I execute that command that you told me I got this error for both nodes:

Error: Invalid or unsupported command

image

gurevichmark commented 3 years ago

This command can also be used to check firmware levels: rinv <nodename> firm

soportemodemat commented 3 years ago

This command can also be used to check firmware levels: rinv <nodename> firm

Hello @gurevichmark with that command I get this result, how can it help me to solve this problem?

firmw

gurevichmark commented 3 years ago

Turn off debug trace mode, so it is easier to read the output with chdef -t site clustersite xcatdebugmode=0 Then run the rinv <nodename> firm command against working and non-working node to see if there are differences in firmware levels.

soportemodemat commented 3 years ago

Turn off debug trace mode, so it is easier to read the output with chdef -t site clustersite xcatdebugmode=0 Then run the rinv <nodename> firm command against working and non-working node to see if there are differences in firmware levels.

Hi @gurevichmark, thank you for your reply, there is a difference of firmware according to your explanation, then what I have to do? Are you sure that is the solution a firmware upgrade?

firmw2

gurevichmark commented 3 years ago

Difference in firmware level could explain why one node is booting and the other one does not. You can try to upgrade the firmware on quinde-1-2 to the same version as on quinde-2-8 and see if that makes a difference.

soportemodemat commented 3 years ago

Difference in firmware level could explain why one node is booting and the other one does not. You can try to upgrade the firmware on quinde-1-2 to the same version as on quinde-2-8 and see if that makes a difference.

Hi @gurevichmark I was able to upgrade until the firmware version 1.94 (ilo3) and this is the latest version for that ilo and I still have the same problem. Any other ideas? Just to mention, When I installed directly the Centos 8 on that server it installs without any problem but with that xcat image it doesn't work for that server.

gurevichmark commented 3 years ago

@soportemodemat Are the two nodes not the same models ? The quinde-2-8 seems to have different firmware level.

Have you tried installing diskful or diskless vanilla Centos8 on quinde-1-2 with xCAT ? Maybe the problem is the "custom" part of the custom_centos8-x86_64-install-compute os image?

soportemodemat commented 3 years ago

@soportemodemat Are the two nodes not the same models ? The quinde-2-8 seems to have different firmware level.

Have you tried installing diskful or diskless vanilla Centos8 on quinde-1-2 with xCAT ? Maybe the problem is the "custom" part of the custom_centos8-x86_64-install-compute os image?

Indeed, they are different servers. No, I haven't tried that because I need that centos version to use with openhpc in diskful type. Therefore I am interested in installing xcat as I did it on the other server without any error.

gurevichmark commented 3 years ago

@soportemodemat I would recommend installing diskful vanilla Centos8 on quinde-1-2 with xCAT. That could tell you if there is something wrong with the server or with your custom Centos8 image definition.

You can also run reventlog quinde-1-2 to see if any hardware or firmware problems logged by the BMC.

soportemodemat commented 3 years ago

I have still the same problem, it just happens with xcat on that server but when I install via cdroom there is no anyproblem during installation: dracutError

ANy ideas to solve this?

soportemodemat commented 3 years ago

@soportemodemat I would recommend installing diskful vanilla Centos8 on quinde-1-2 with xCAT. That could tell you if there is something wrong with the server or with your custom Centos8 image definition.

You can also run reventlog quinde-1-2 to see if any hardware or firmware problems logged by the BMC.

Hi guys, could you help me with the instructions to build an xcat image by using these files:

http://mirror.centos.org/centos/8/BaseOS/x86_64/kickstart/

I have already downloaded this folder to the xcat master. I want to do this because I have the same error as it is reported here: https://community.theforeman.org/t/cant-kickstart-centos-8/15566/17 and those guys says that with those files can be solved:

gurevichmark commented 3 years ago

@soportemodemat

soportemodemat commented 3 years ago

HI @gurevichmark

I really appreciate your reply. About the link, the error that I have is related to iscsi dracut init fails and that entry is here:

foroDracuta

Now, I am trying to install a diskless xcat image of centos 8.2 as you recommend me but now I have this error about ipv6 with that image:

ipv6error

I think that the error that I have in the diskful image is related to the kickstart that is in /tftpboot/xcat/xnba/nodes/quinde-1-10 as this case: https://sourceforge.net/p/xcat/mailman/xcat-user/thread/5665DB1B.2060202%40lbl.gov/

soportemodemat commented 3 years ago

Hi guys

According to the error that is shown in the image below and this bug in Centos 7 which is very similar to that one that I have in centos 8 when I deploy centos 8.2 stateful image with xcat 2.16. The solution is to blacklist the multipath kernel module with this: rd.driver.blacklist=dm-multipath. But I do not know where to specify that for the xcat image,could you give me any ideas for that?

dracutError

T

gurevichmark commented 3 years ago

You can try setting addkcmdline attribute for node or osimage. Something like chdef <node> addkcmdline=rd.driver.blacklist=dm-multipath

soportemodemat commented 3 years ago

Hello guys, I am still with the same problem. But I discovered that installing these rpms with these commands sequently in a normal centos 8 installation with DVD it can recognise the network cards of the server:

rpm -Uvh linux-firmware-20200619-101.git3890db36.el8_3.noarch.rpm rpm -Uvh kexec-tools-2.0.20-34.el8_3.1.x86_64.rpm rpm -ivh kernel-4.18.0-193.el8.x86_64.rpm rpm -Uvh kernel-core-4.18.0-193.el8.x86_64.rpm rpm -ivh kernel-core-4.18.0-240.10.1.el8_3.x86_64.rpm rpm -ivh kmod-be2net-12.0.0.0-6.el8_3.elrepo.x86_64.rpm

It is because of this case documented here: http://blog.dovid.net/how-to-get-the-broadcom-network-drivers-working-with-on-a-hp-bl460c-gen8-with-centos8/.

I have tried to install these rpms by injecting them in the image, using pre-installation and post-installation script but nothing works. I was thinking if there is a way to insert this centos 8 node already installed to the xcat cluster, without reinstalling it by PXE. Do you know how to achieve that?

Thank you in advance for your help