plus3it / spel

STIG-Partitioned Enterprise Linux (spel)
Other
91 stars 61 forks source link

Mar 2024 spel-minimal-centos-8stream-hvm /boot partition too small #682

Closed krsheldon closed 3 months ago

krsheldon commented 3 months ago

The /boot partition in the Mar 2024 spel-minimal-centos-8stream-hvm AMI is only 300M. As soon as a kernel update is attempted, the partition runs out of space and fails on reboot. This is due to the initramfs image files being 100M+ each. Since the /boot partition size cannot be changed, the AMI is unusable if an updated kernel is required.

Suggestions for fix

For CENTOS the recommended /boot partition size is 1GiB (see ref). Future builds should increase the /boot to at least this size.

Relevant references

ferricoxide commented 3 months ago

Can you verify that 300MiB number? The default should have been 400MiB for both CentOS Stream and Red Hat 8.

At any rate, some geometry changes are already on tap for April's release. Mostly deliberating on implementation-mechanism: are you just a consumer of the AMIs we publish or are you using the automation to generate your own AMIs (answer might provide selection-impetus for which implementation-mechanism we choose).

krsheldon commented 3 months ago

Just a consumer. I did check and the number is closer to 400MB, but the intramfs images still consume the entire space with the first kernel update. Here is output from lsblk before rebooting with a new kernel:

[root@DevDesktop-sheldontest1 /]# lsblk
NAME                 MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
nvme0n1              259:0    0  160G  0 disk
├─nvme0n1p1          259:1    0    1M  0 part
├─nvme0n1p2          259:2    0   95M  0 part /boot/efi
├─nvme0n1p3          259:3    0  382M  0 part /boot
└─nvme0n1p4          259:4    0 19.5G  0 part
  ├─RootVG-rootVol   253:0    0    6G  0 lvm  /
  ├─RootVG-swapVol   253:1    0    2G  0 lvm  [SWAP]
  ├─RootVG-homeVol   253:2    0    1G  0 lvm  /home
  ├─RootVG-varVol    253:3    0    2G  0 lvm  /var
  ├─RootVG-varTmpVol 253:4    0    2G  0 lvm  /var/tmp
  ├─RootVG-logVol    253:5    0    2G  0 lvm  /var/log
  └─RootVG-auditVol  253:6    0  4.5G  0 lvm  /var/log/audit

Since we are running watchmaker as part of the userData script in CloudFormation, the instance fails before we can take any action. I logged into the instance from the AWS console while watchmaker was running to see what was happening. Based on your comments, will standby for the April update.

ferricoxide commented 3 months ago

Cool. Thank you for the additional information. In general, we tend to set most of the partitions smaller than recommended: most of them are LVMed and and, therefore, expandable. Prior to Red Hat EFI-enabling their RHEL 8 AMIs (AMIs for 8.Xs prior to 8.9 only supported BIOS-boot), we didn't even have /boot (or /boot/efi) as its own partition. When we added /boot to the EL8 builds (to support EFI-boot), we were using previous ELx versions to guide the setting of the partition's size. The EL7 AMIs' /boot was sized to be large enough for 3 kernel RPMs' worth of space. We'd prefer to maintain that ratio. So, going to fire up a test instance to see what that value is and how closely it aligns to 1GiB before deciding how much space to allocate.

ferricoxide commented 3 months ago

Actually looking at a CentOS Stream 8 AMI (our January one):

# du -sh /boot
266M    /boot

# du -sh /boot/* | sort -h
0       /boot/efi
0       /boot/symvers-4.18.0-536.el8.x86_64.gz
8.0K    /boot/loader
200K    /boot/config-4.18.0-536.el8.x86_64
4.3M    /boot/System.map-4.18.0-536.el8.x86_64
5.7M    /boot/grub2
11M     /boot/vmlinuz-0-rescue-ec233554c77b62b5de395e2b30bd2de4
11M     /boot/vmlinuz-4.18.0-536.el8.x86_64
32M     /boot/initramfs-4.18.0-536.el8.x86_64kdump.img
102M    /boot/initramfs-0-rescue-ec233554c77b62b5de395e2b30bd2de4.img
102M    /boot/initramfs-4.18.0-536.el8.x86_64.img

Yeah, it looks like we'd have needed 800MiB (ish) to meet our goal. Add in "slop" and, yeah, that gets us to 1GiB. I'll update our defaults to reflect.

ferricoxide commented 3 months ago

@krsheldon sed:

…Based on your comments, will standby for the April update.

Since April images won't be released till sometime next week and I needed to test the fix-code, any way, I generated 2024.03.2 images for RHEL, CentOS Stream and Oracle Linux 8 in all the regions we build images to. I also tested each AMI's storage-needs by installing two kernels in addition to the active one:

[maintuser@ip-172-31-20-213 ~]$ rpm -qa kernel
kernel-4.18.0-552.el8.x86_64
kernel-4.18.0-547.el8.x86_64
kernel-4.18.0-548.el8.x86_64

[maintuser@ip-172-31-20-213 ~]$ ls -lh /boot/initramfs-*
-rw-------. 1 root root 104M Apr 11 13:54 /boot/initramfs-0-rescue-ec2b65c17269987b0f49617410968192.img
-rw-------. 1 root root 103M Apr 11 14:32 /boot/initramfs-4.18.0-547.el8.x86_64.img
-rw-------. 1 root root 103M Apr 11 14:33 /boot/initramfs-4.18.0-548.el8.x86_64.img
-rw-------. 1 root root 103M Apr 11 13:59 /boot/initramfs-4.18.0-552.el8.x86_64.img
-rw-------. 1 root root  32M Apr 11 14:20 /boot/initramfs-4.18.0-552.el8.x86_64kdump.img

[maintuser@ip-172-31-20-213 ~]$ df -PH /boot /boot/efi
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p3  1.1G  570M  451M  56% /boot
/dev/nvme0n1p2  100M  7.6M   92M   8% /boot/efi

Looks like even setting the parition as small as 768MiB would have been sufficiently-generous.

At any rate, if you'd like to not be "stuck" until next week, please try out the relevant new AMI(s) and let us know if you discover any further defects. I

krsheldon commented 2 months ago

Thanks for the quick fix on this. I won't be back to the office until Monday, but I will be sure to check it out.

krsheldon commented 2 months ago

I just tested the AMI and system built successfully. All partitions look good