xcat2 / xcat-core

Code repo for xCAT core packages
Eclipse Public License 1.0
365 stars 171 forks source link

install hangs - pane is dead error while installing new image centos 7.6 #6600

Closed urielrosen closed 2 years ago

urielrosen commented 4 years ago

Hi ,

after adding a new image of centos 7.6 to my deployment images (I have a working centos 7.3 image) the install hangs with error "pane is dead" below is the output of my node and image , please advise .

lsdef -t osimage centos7.6-x86_64-install-compute Object name: centos7.6-x86_64-install-compute imagetype=linux osarch=x86_64 osdistroname=centos7.6-x86_64 osname=Linux osvers=centos7.6 otherpkgdir=/install/custom/install/centos/7.6/otherpkgs/ otherpkglist=/install/custom/install/centos/7.6/centos7.otherpkgs.pkglist pkgdir=/install/centos7.6/x86_64 pkglist=/install/custom/install/centos/7.6/centos7.6-x86_64-install-compute.pkglist profile=compute provmethod=install template=/opt/xcat/share/xcat/install/centos/compute.centos7.tmpl

cnXXX arch=x86_64 bmc=#### bmcpassword=##### bmcusername=#### cons=ipmi currchain=boot currstate=install centos7.6-x86_64-compute groups=DELL_C6420 initrd=xcat/osimage/centos7.6/initrd.img installnic=p4p1 interface=p4p1 ip=#### kcmdline=quiet inst.repo=http://######:80/install/centos7.6/x86_64 inst.ks=http://######:80/install/autoinst/cn050 ip=eth0:dhcp kernel=xcat/osimage/centos7.6/vmlinuz mac=50:6b:4b:d4:32:ac mgt=ipmi mtm=DELL:PowerEdge C6420 netboot=xnba nfsserver=###### os=centos7.6 postscripts=postscripts=syncfiles,net_DELL_C6420,centos7.3/wexac.post,remoteshell,sssd,NTP,wexac_7.post,install_MLNX_OFED_LINUX-4.1-1.0.2.0,install_gpfs_client.sh,centos7.3/centos7.lsf,fstab,Dell_supportassist.sh primarynic=p4p1 profile=compute provmethod=centos7.6-x86_64-install-compute serial=1CFVNR2 serialport=0 serialspeed=115200 status=installing statustime=03-01-2020 11:37:22 tftpserver=####### updatestatus=failed updatestatustime=10-10-2018 16:40:26

samveen commented 4 years ago

@urielrosen are you able to connect to the console of the node, or is this what you see in the console log on the xcat server?

urielrosen commented 4 years ago

Hi,

I see this in the console of the server (using wcons command)

[cid:image001.jpg@01D5F2CA.86334440]

It seems like I am missing something in the settings of the new image , Can you tell me what are all the required steps and commands to accomplish this ?

Thanks, Uriel.

From: Samveen notifications@github.com Sent: Thursday, March 5, 2020 3:54 AM To: xcat2/xcat-core xcat-core@noreply.github.com Cc: Uriel Rosen uriel.rosen@weizmann.ac.il; Mention mention@noreply.github.com Subject: Re: [xcat2/xcat-core] install hangs - pane is dead error while installing new image centos 7.6 (#6600)

@urielrosenhttps://github.com/urielrosen are you able to connect to the console of the server, or is this what you see in the console log?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/xcat2/xcat-core/issues/6600?email_source=notifications&email_token=ANSIVRMAINDGD7W3OC2M7U3RF4A2NA5CNFSM4K7DHRJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEN3MG2A#issuecomment-594985832, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANSIVRM6B2BZEEHZA3CDAG3RF4A2NANCNFSM4K7DHRJA.

samveen commented 4 years ago

@urielrosen can you upload the image to the github issue directly instead of replying by mail? Github's issue bot doesn't seem good enough to add image into the issue when replying by email. (I don't see the image you're referring to, in the ticket)

urielrosen commented 4 years ago

hi ,

image is attached ,

can you write all the obligatory steps to add a new image to see if I missed anything critical ?

image

samveen commented 4 years ago

@urielrosen As you can see the bottom of the screen, tmux has 5 panes running. What you should do is change to the shell tab using ctrl-b followed by 2 to switch to the shell and then look at the kickstart file and examine the logs. This should give you all the details you need. I would love to help debug, but remote debug isn't possible (I'd need access to the console to be able to check ).

urielrosen commented 4 years ago

Hi,

The thing is that if I change the osimage to the working image and install the same server This works , when I switch to the new image this happens , Can you provide extended instructions on adding a new image with examples ?

Thanks, Uriel.

From: Samveen notifications@github.com Sent: Tuesday, March 10, 2020 7:57 AM To: xcat2/xcat-core xcat-core@noreply.github.com Cc: Uriel Rosen uriel.rosen@weizmann.ac.il; Mention mention@noreply.github.com Subject: Re: [xcat2/xcat-core] install hangs - pane is dead error while installing new image centos 7.6 (#6600)

@urielrosenhttps://github.com/urielrosen As you can see the bottom of the screen, tmux has 5 panes running. What you should do is change to the shell tab using ctrl-b followed by 2 to switch to the shell and then look at the anaconda file and examine the logs. This should give you all the details you need. I would love to help debug, but remote debug isn't possible (I'd need access to the console to be able to check ).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/xcat2/xcat-core/issues/6600?email_source=notifications&email_token=ANSIVRIZ3PSO2Z6ID6N65LDRGXJCNA5CNFSM4K7DHRJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOKD2YY#issuecomment-596917603, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANSIVROLYI44HHQ5NF4FMK3RGXJCNANCNFSM4K7DHRJA.

cxhong commented 4 years ago

can u check the passwd table? the system needs to have passwd set up.

# tabdump passwd
#key,username,password,cryptmethod,authdomain,comments,disable
"system","root","cluster",,,,

try this command to see if there is some config issue needs to take care on the xcat management node

xcatprobe xcatmn -i <interfacename>
urielrosen commented 4 years ago

Hi ,

I was able to solve this by changing the template=/install/custom/install/centos/7.6/centos7.tmpl reference , but now I am getting errors about missing packages , I used the centos 7.6 minimal image , do you know if I need to download rpms or instead use the CentOS-7-x86_64-DVD-1810.iso ?

image

cxhong commented 4 years ago

So those packages didn't find in the centos7.6 minimal image? you can remove those packages from packagelist but I think you will need later. you should download iso file, then run copycds to create osimage

samveen commented 4 years ago

@urielrosen Great that you got the template issue solved. About the packages, you will need to create local repository mirrors, beyond the OS image repository created by copycds. The 2 main repositories that I've mirrored locally for Centos are usually Centos updates repository and Fedora Extra Packages for Enterprise Linux. While installing the packages, what you can do is add the other repository paths to the osimage under the pkgdir attribute. An example follows:

centos7.4-x86_64-install-compute:
    objtype=osimage
    imagetype=linux
    osarch=x86_64
    osdistroname=centos7.4-x86_64
    osname=Linux
    osvers=centos7.4
    otherpkgdir=/install/post/otherpkgs/centos7.4/x86_64
    otherpkglist=/install/custom/install/kernel-lt.pkglist,/install/custom/install/puppet4.pkglist
    partitionfile=s:/install/custom/install/partition.sh
    pkgdir=/install/centos7.4/x86_64,/install/mirror/centos/7/os/x86_64,/install/mirror/centos/7/updates/x86_64,/install/kernel-lt/x86_64,/install/mirror/puppetlabs/el/7/products/x86_64,/install/mirror/puppetlabs/el/7/dependencies/x86_64,/install/mirror/puppetlabs/el/7/PC1/x86_64/
    pkglist=/opt/xcat/share/xcat/install/centos/compute.centos7.pkglist
    postscripts=infra-custom/disable_centos_repos,otherpkgs,infra-custom/append_cstate_boot_opts,infra-custom/set_kernel-lt_default_boot_option,infra-custom/configure-puppet-4.10
    profile=compute
    provmethod=install
    template=/opt/xcat/share/xcat/install/centos/compute.centos7.tmpl

As you can see above, my setup had extra repos (pkgdir)and otherpkglists for kernel-lt install and puppet4 packages added to the osimage, and approriate postscripts in the postscripts attribute to configure them at install time. First boot was into the kernel-lt and puppet took over to bring up the nodes to their required state (the puppetmaster I worked with was amazing).

One point to note is that you should probably move all common postscripts into the osimage rather than apply them on node or group level (for example the sssd configuration script is tied to the OS). You can reach out to me in case you get stuck somewhere, and I can see if I can help you figure out the issue.

gurevichmark commented 4 years ago

@samveen Are you still seeing this problem ?

samveen commented 4 years ago

@gurevichmark Ah. This was not an issue I faced. I was just trying to help Uriel with debugging by giving him an idea of what I had faced early on in my xCAT clusters.

gurevichmark commented 4 years ago

Sorry @samveen My question was for @urielrosen. Are you still seeing this problem ?

besawn commented 2 years ago

Closing due to inactivity.