openhpc / ohpc

OpenHPC Integration, Packaging, and Test Repo
http://openhpc.community
Apache License 2.0
856 stars 186 forks source link

Empty /boot/efi with statueful provision of openHPC 2.4 for openSUSE 15.3 Leap #1400

Open devilkingsatan666 opened 2 years ago

devilkingsatan666 commented 2 years ago

Dear openHPC team, I installed a base OS with the off-line installation media in efi mode on my master machine and followed the recipe to prepare for stateful provisioning. In the section on installing grub2-efi and grub2-efi-modules, my package manager told me there's no package named "grub2-efi-modules". I searched online and got the result that only Red Hat machines support it. Therefore, I directly skipped installing it, which may then cause the bug. Is there any chance that the recipe needs some fix? Thanks for your reading. I appreciate this great effort made by the community.

Best regards, Wu-Cheng Chiang

devilkingsatan666 commented 2 years ago

Some files and probe may be helpful for debugging:

tree /boot on ptsl10 (one of my compute node):

/boot
|-- System.map-5.3.18-59.37-default
|-- boot.readme
|-- config-5.3.18-59.37-default
|-- efi
|-- grub2
|   |-- grub.cfg
|   |-- grubenv
|   `-- themes
|       `-- openSUSE
|           |-- COPYING.CC-BY-SA-3.0
|           |-- DejaVuSans-Bold14.pf2
|           |-- DejaVuSans10.pf2
|           |-- DejaVuSans12.pf2
|           |-- README
|           |-- ascii.pf2
|           |-- highlight_c.png
|           |-- logo.png
|           |-- slider_c.png
|           |-- slider_n.png
|           |-- slider_s.png
|           `-- theme.txt
|-- initrd -> initrd-5.3.18-59.37-default
|-- initrd-5.3.18-59.37-default
|-- symvers-5.3.18-59.37-default.gz
|-- sysctl.conf-5.3.18-59.37-default
|-- vmlinux-5.3.18-59.37-default.gz
|-- vmlinuz -> vmlinuz-5.3.18-59.37-default
`-- vmlinuz-5.3.18-59.37-default

4 directories, 24 files

/etc/warewulf/filesystem/efi.cmds:

# EFI / GPT Example

# Parted specific commands
select /dev/sda
mklabel gpt
mkpart ESP fat32 1MiB 513MiB
mkpart primary linux-swap 513MiB 128GiB
mkpart primary ext4 128GiB 100%
name 1 ESP
name 2 swap
name 3 root
set 1 boot on

# mkfs NUMBER FS-TYPE [ARGS...]
mkfs 1 vfat -n ESP
mkfs 2 swap
mkfs 3 ext4 -L root

# fstab NUMBER fs_file fs_vfstype fs_mntops fs_freq fs_passno
fstab 3 / ext4 defaults 0 0
fstab 1 /boot/efi vfat defaults 0 0
fstab 2 swap swap defaults 0 0

zypper --root $CHROOT search -i grub2:

S  | Name                       | Summary                                               | Type
---+----------------------------+-------------------------------------------------------+--------
i+ | grub2                      | Bootloader with support for Linux, Multiboot and more | package
i+ | grub2-branding-openSUSE    | openSUSE Leap 15.3 branding for GRUB2                 | package
i  | grub2-i386-pc              | Bootloader with support for Linux, Multiboot and more | package
i  | grub2-systemd-sleep-plugin | Grub2's systemd-sleep plugin                          | package
i+ | grub2-x86_64-efi           | Bootloader with support for Linux, Multiboot and more | package
i+ | ruby2.5-rubygem-cfa_grub2  | Models for GRUB2 configuration files                  | package

wwsh provision print ptsl10 (one of my compute node):

#### ptsl10.PTSL-XCBC #########################################################
ptsl10.PTSL-XCBC: MASTER           = UNDEF
ptsl10.PTSL-XCBC: BOOTSTRAP        = 5.3.18-59.37-default
ptsl10.PTSL-XCBC: VNFS             = leap15.3
ptsl10.PTSL-XCBC: VALIDATE         = FALSE
ptsl10.PTSL-XCBC: FILES            = dynamic_hosts,group,munge.key,passwd,shadow,slurm.conf
ptsl10.PTSL-XCBC: PRESHELL         = FALSE
ptsl10.PTSL-XCBC: POSTSHELL        = FALSE
ptsl10.PTSL-XCBC: POSTNETDOWN      = FALSE
ptsl10.PTSL-XCBC: POSTREBOOT       = FALSE
ptsl10.PTSL-XCBC: CONSOLE          = ttyS1,115200
ptsl10.PTSL-XCBC: PXELOADER        = UNDEF
ptsl10.PTSL-XCBC: IPXEURL          = UNDEF
ptsl10.PTSL-XCBC: SELINUX          = DISABLED
ptsl10.PTSL-XCBC: KARGS            = "net.ifnames=0 biosdevname=0"
ptsl10.PTSL-XCBC: FS               = "select /dev/sda,mklabel gpt,mkpart ESP fat32 1MiB 513MiB,mkpart primary linux-swap 513MiB 128GiB,mkpart primary ext4 128GiB 100%,name 1 ESP,name 2 swap,name 3 root,set 1 boot on,mkfs 1 vfat -n ESP,mkfs 2 swap,mkfs 3 ext4 -L root,fstab 3 / ext4 defaults 0 0,fstab 1 /boot/efi vfat defaults 0 0,fstab 2 swap swap defaults 0 0"
ptsl10.PTSL-XCBC: BOOTLOADER       = sda
ptsl10.PTSL-XCBC: BOOTLOCAL        = FALSE

efibootmgr on ptsl10 (one of my compute node):

BootCurrent: 0007
BootOrder: 0006,0007,0000,0001,0002,0003,0004
Boot0000  Enter Setup
Boot0001  BootDevices
Boot0002  Boot Manager
Boot0003  Setup
Boot0004  Diagnostics
Boot0006* Hard Disk 0
Boot0007* PXE Network
devilkingsatan666 commented 2 years ago

One of the workarounds for this is to run grub2-install /dev/sda after provision to re-create the missing stuff with pdsh -w ${compute_regex} as the following: pdsh -w ${compute_regex} grub2-install /dev/sda

github-actions[bot] commented 1 month ago

A friendly reminder that this issue had no activity for 30 days.