Resizing of volume groups causes mounting issues on Rhel-8

Chris-Long02 commented 1 month ago

Expected behavior

Resize volume groups using lvextend, create an AMI, launch and log into AMI with the resized volume groups.

Actual behavior

After the volume groups are resized and then an image is made, any of the instances launched from the new image fail to mount. Most commonly /var/log/audit.

Steps to expand volumes

DISK=$(lsblk -l | sort | awk '{ if ($6 == "disk") { print $1 }}' | tail -1)
DISKPART=$(lsblk -l | sort | awk '{ if ($6 == "part") { print $1 }}' | tail -1)
PARTNUM=$(echo "$DISKPART" | grep -oP "[0-9]+$")
sudo growpart /dev/$DISK $PARTNUM
sudo pvresize /dev/$DISKPART

VOL=$(lsblk -l | sort | awk '{ if ($7 == "/") { print $1 }}' | tail -1)
GROUP=$(echo $VOL | grep -oP "^[^-]*")

sudo lvextend -r -L 4G /dev/$GROUP/homeVol
sudo lvextend -r -L 8G /dev/$GROUP/varVol
sudo lvextend -r -L 6G /dev/$GROUP/logVol
sudo lvextend -r -L 10G /dev/$GROUP/auditVol
sudo lvextend -r -l +100%FREE /dev/$GROUP/rootVol

Context/Specifications

OS/VERSION: Rhel-8 AMI: spel-minimal-rhel-8-hvm-2024.04.1.x86_64-gp3

Any help would be greatly appreciated.

lorengordon commented 1 month ago

Hi @Chris-Long02, sorry for the delayed response, I was out on vacation for a couple weeks. Did you figure out the issue and get things working? If so, would you mind posting the solution, in case someone else runs into it later?

Chris-Long02 commented 1 month ago

I unfortunately never found a fix. I closed it as the latest release doesn't have the issue.

lorengordon commented 1 month ago

Hmm, it's been fairly quiet between the April and May releases, nothing comes to mind as far as any changes in this project or its dependencies. The May release should just have patch updates, compared to April, but should otherwise be the same...

Chris-Long02 commented 1 month ago

@lorengordon I take back what I said before. I've run into the issue again on the spel-minimal-rhel-8-hvm-2024.05.1.x86_64-gp3 release. With no configurations other than what was posted in the original message, there are still mounting issues.

ferricoxide commented 1 month ago

This morning, when I reformatted issue-opening information to make the code more-easily excerptible, I did launch an EC2 from the stated AMI using the provided code. I wasn't able to replicate the issue (applied the script's geometry changes and rebooted right back to a running state). That said, the code as provided didn't actually function without modification. Specifically, the growpart call fails because the disk-referents end up incorrect.

Note: I was using a Nitro-based instance-type (specifically, a t3.medium). I would need to ask:

What instance-type was used
What information was captured during the running of the script
What information was captured from the broken system's console

With respect to the third point, if you use a Nitro-based instance-type, the AMIs baked-in SSM packages/services make it so that you can access the rescue-mode prompt from the EC2 webUI if you have an appropriate instance-role attached to the EC2 and the UI-user has an appropriate access-policy attached to their user

Chris-Long02 commented 1 month ago

@ferricoxide I realize now that I left out somewhat important info, sorry for that. I am using a t3.large and launching with 50 GBs of storage. I tried running from a t3.medium and the script ran fine still.

Before the script is run, the output of lsblk is:

After the script is ran the output of lsblk is:

Output as the script runs is: CHANGED: partition=4 start=2199552 old: size=39741440 end=41940991 new: size=102658015 end=104857566 with similar outputs for each volume.

The issue is then normally tripped if I make an AMI of the instance that ran the script and launch from the new AMI, it tends to still be fine after a reboot. The really tricky part is that the issue doesn't occur every time. I have to launch a small batch of maybe 5 instances and maybe 1 or 2 will succeed with the rest failing.

ferricoxide commented 1 month ago

Ok. Without knowing your child-AMI process and output from that failing process. If you can get us some diagnostic data, we might be able to help out with your specific consistency-problem or, if there's something truly problematic with the AMI(s), fix the automation used to produce them.

Something to consider over using our AMIs as a starting point is using this automation to originate your own, more-suitable-to-you AMIs. The automation has enough configurational-flexibility in it to do so (I'm super lazy, so I parameterized a lot of the plumbing to allow for things like geometry-customization and installation of custom RPM manifexts).

Chris-Long02 commented 1 month ago

This is the log file from a failed instance: failed_instance.log

ferricoxide commented 1 month ago

Ok, it's not being super helpful with those logged-errors, eh?

         Mounting /var/log/audit...
[   27.582818] XFS (dm-6): Mounting V5 Filesystem
[   27.806297] ppdev: user-space parallel port driver
[   28.392404] XFS (dm-6): Ending clean mount
[  OK  ] Started Flush Journal to Persistent Storage.
[FAILED] Failed to mount /var/log/audit.
See 'systemctl status var-log-audit.mount' for details.
[DEPEND] Dependency failed for Local File Systems.
[DEPEND] Dependency failed for Mark the need to relabel after reboot.

If we want to see more, probably going to need to login and see what it's annoyed about (using the prescribed diagnostic command). This may require setting a password on the root account (doable via user-data payload, especially using a cloud-config directive), as I haven't tried an init=bash type of boot since at least EL7.

Chris-Long02 commented 1 month ago

Interesting, it's saying the mount doesn't exist. Output:

Mounting /var/log/audit... var-log-audit.mount: Mount process finished, but there is no mount. var-log-audit.mount: Failed with result 'protocol'. Failed to mount /var/log/audit.

ferricoxide commented 1 month ago

Ok... but it successfully mounted /var/log prior to attempting to mount /var/log/audit (i.e., "there should have been a suitable mount-point available in that already-mounted /var/log filesystem)?

Chris-Long02 commented 1 month ago

Right, there should've been. I found another post about this, going to try implementing it.

ferricoxide commented 1 month ago

Good luck. Let us know if you're able to isolate anything and if there's anything you suspect could be added into the images to help with the issue.

That linked-post makes it sound like what you're seeing could be another of those "this is the downside of 'stable-release' distros like Red Hat Enterprise Linux"? I mean, at least with EL8, they have been doing more-frequent rebasing of tools rather than only patching packages that the X.0 release shipped with.

Chris-Long02 commented 1 month ago

Haven't had any success yet, but did find out more info.

/usr/bin/mount is exiting with status 32 for /boot. Investigating the cause of that.

mrabe142 commented 4 weeks ago

I maybe be having the same/similar issue. I am attempting to use spel-minimal-rhel-8-hvm-2024.05.1.x86_64-gp3 AMI, t2.2xlarge, 500 GB gp3 volume. I can launch the VM and SSH to it. Once I do a sudo reboot (without running any other commands after launching the VM), the VM boot fails with the failure to mount /var/log/audit and /var/tmp (same error messages shown above). I also tried a sudo dnf update to get the latest kernel/systemd/other stuff and same issue after rebooting.

One thing I noticed though is that if I keep rebooting, sometimes it boots correctly and I can get back in. If I reboot again, it goes back to failed boots.

I've been digging through logs, I see the mount fail with that code when a device is busy while the system is shutting down:

Jun 11 20:13:55 ip-10-215-4-121 systemd[1]: Unmounting /var/tmp...
Jun 11 20:13:55 ip-10-215-4-121 kernel: XFS (dm-6): Unmounting Filesystem
Jun 11 20:13:55 ip-10-215-4-121 umount[5092]: umount: /var/tmp: target is busy.
Jun 11 20:13:55 ip-10-215-4-121 systemd[1]: Unmounting /home...
Jun 11 20:13:55 ip-10-215-4-121 systemd[1]: Unmounting Temporary Directory (/tmp)...
Jun 11 20:13:55 ip-10-215-4-121 systemd[1]: Unmounting /boot/efi...
Jun 11 20:13:55 ip-10-215-4-121 systemd[1]: var-log-audit.mount: Succeeded.
Jun 11 20:13:55 ip-10-215-4-121 systemd[1]: Unmounted /var/log/audit.
Jun 11 20:13:55 ip-10-215-4-121 systemd[1]: var-tmp.mount: Mount process exited, code=exited status=32
Jun 11 20:13:55 ip-10-215-4-121 systemd[1]: Failed unmounting /var/tmp.

After comparing successful boots with unsuccessful ones, my guess is that a systemd reload is breaking the mounting process. Here are two examples of failed one where a systemd[1]: Reloading. happens after a mount process has started and all started mount processes fail:

Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: Mounted /var/log.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: Mounting /var/log/audit...
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: Starting Flush Journal to Persistent Storage...
Jun 11 20:14:39 ip-10-215-4-121 kernel: XFS (dm-6): Mounting V5 Filesystem
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: Reloading.
Jun 11 20:14:39 ip-10-215-4-121 kernel: XFS (dm-6): Ending clean mount
Jun 11 20:14:39 ip-10-215-4-121 kernel: XFS (dm-4): Ending clean mount
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: var-tmp.mount: Mount process finished, but there is no mount.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: var-tmp.mount: Failed with result 'protocol'.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: Failed to mount /var/tmp.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: Dependency failed for Basic System.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: Dependency failed for Multi-User System.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: Dependency failed for Graphical Interface.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: graphical.target: Job graphical.target/start failed with result 'dependency'.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: multi-user.target: Job multi-user.target/start failed with result 'dependency'.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: basic.target: Job basic.target/start failed with result 'dependency'.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: Dependency failed for Network Name Resolution.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: systemd-resolved.service: Job systemd-resolved.service/start failed with result 'dependency'.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: Dependency failed for Local File Systems.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: Dependency failed for Mark the need to relabel after reboot.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: selinux-autorelabel-mark.service: Job selinux-autorelabel-mark.service/start failed with result 'dependency'.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: var-log-audit.mount: Mount process finished, but there is no mount.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: var-log-audit.mount: Failed with result 'protocol'.
Jun 11 20:14:39 ip-10-215-4-121 systemd[1]: Failed to mount /var/log/audit.

Jun 11 20:25:06 ip-10-215-4-121 systemd[1]: Mounting /var/log/audit...
Jun 11 20:25:06 ip-10-215-4-121 kernel: XFS (dm-4): Ending clean mount
Jun 11 20:25:06 ip-10-215-4-121 kernel: XFS (dm-6): Mounting V5 Filesystem
Jun 11 20:25:06 ip-10-215-4-121 systemd[1]: Mounted /var/tmp.
Jun 11 20:25:06 ip-10-215-4-121 systemd[1]: Reloading.
Jun 11 20:25:06 ip-10-215-4-121 kernel: XFS (dm-6): Ending clean mount
Jun 11 20:25:07 ip-10-215-4-121 systemd[1]: var-log-audit.mount: Mount process finished, but there is no mount.
Jun 11 20:25:07 ip-10-215-4-121 systemd[1]: var-log-audit.mount: Failed with result 'protocol'.
Jun 11 20:25:07 ip-10-215-4-121 systemd[1]: Failed to mount /var/log/audit.

The boots without that reloading statement in the mounting process succeed.

Chris-Long02 commented 3 weeks ago

@mrabe142 Are your instances failing from spel-minimal-rhel-8-hvm-2024.05.1.x86_64-gp3 directly or from an ami you've made using it as a base?

mrabe142 commented 3 weeks ago

I use the AMI directly (no modifications/derivatives).

Chris-Long02 commented 3 weeks ago

Have you found a fix by any chance? I haven't been able to do anything successful.

mrabe142 commented 3 weeks ago

I did not. I know some systemd stuff can execute in parallel during boot so maybe there is just an unfortunate ordering to things in this setup where the reload is being triggered at a bad time by something else. I did not get a chance to play with adding other attributes to the fstab entries to see if they would make any difference either.

ferricoxide commented 3 weeks ago

What follows is mostly "grabbing at straws" in nature (since I haven't encountered the issue, myself, and it seems like the issues you've encountered haven't been 100% reapeatable)…

Unfortunately, with the interaction between systemd and /etc/fstab, this may be an issue with the version of systemd on EL8 not properly-asserting dependencies between *.mount unit-types. But that's definitely a "dunno": when I have time, I'll see if my Google Fu allows me to find some relevant Bug-IDs. Unfortunately, since Red Hat changed from Bugzilla to Jira – and the Jira-hosted content is generally no longer anonymously-available (like much of the Bugzilla-tracked content was), Googling for vendor-tracked bug-content has been made significantly more difficult.

At any rate…

Because a lot of people would die of hives if /etc/fstab were wholly done away with, most systemd-based distros (like RHEL) use utilities like systemd-fstab-generator to create the necessary *.mount unit-files. systemd then actually uses those to effect the mounts/umounts. Unfortunately, the resultant unit-files don't have the kind of hard ordering-assertion that can be had if one wholly did away with /etc/fstab in favor of self-managed mount-units.

That said, I know that at one point in the hardening-content cycles for RHEL (at this point, can't specifically remember if it was EL7, EL8 or EL9 but I think it was in the earliest iterations for EL7), the remediation-content tried to convert /var/tmp into a loopback-mount of /tmp. While this is/was serviceable when /tmp was located on a Real™ block-device – rather than the tmpfs pseudofilesystem implemented in the spel Amazon Machine Images and Azure VM templates – it could cause significant breakage if /tmp was a pseudofs.

I mention the hardening-content because most of those we're aware of that use the spel-generated images pair that usage with watchmaker (or equivalent tools ) for provisioning-time hardening of deployed EC2s/instances. So, I would ask, "are you witnessing this behavior in concert with the use of watchmaker (or other, equivalent hardening methods)"? If so, it's possible that the breakage is resulting from that

The other thing I would ask, particularly given @mrabe142's mention of:

…500 GB gp3 volume. …

"Are either/both of you expanding the root disk/VG"? Basically, wondering if doing so might be introducing disjoint volume-composition that might be slowing down the mounts/umounts that systemd is managing. Transient issues are often symptomatic or race-conditions. If there's volume-expansions that are introducing timing issues, that could be a contributor to the variability of the problem(s) witnessed. In general, we recommend that (as was the prior recommendations with physical systems and legacy/on-prem virtualization platforms) users keep OS and application data on separate block devices (and, if placing the application block devices into LVM2 volume-groups, ensure those application VGs are separate from the root VG). These recommendations were mostly related to backups and other data-portability concers. In other words, I don't know that such would prevent the problem but it might (see, "grabbing at staws").

As a final mention: the Red Hat AMIs (and Azure VM-templates) are tagged with Red Hat "pay as you go" entitlements (part of why the EC2s launched from RHEL AMIs have a higher hourly charges than those launched from the CentOS/CentOS Stream or Oracle Linux AMIs). The primary purpose of these entitlements is providing access to the official dnf repositories maintained by Red Hat. That said, those entitlements also entitle an AWS (or Azure) account-holder to limited OS support (via the CSP's case-management system). In the past, I've been able to open support tickets with AWS using those tagged entitlements. That may be a pursuit-avenue available to you to help you identify the underlying problem.

mrabe142 commented 3 weeks ago

To summarize my observations:

This issue happens with every VM I spin up with this AMI. Have tried about five times now.
The issue happens inconsistently during boot (much like a race condition) but happens during most boots except the first one
I plan to do hardening but this issue happens before any modifications are done to the system. I can launch the plain AMI then issue a reboot and see the issue
I plan to expand the volumes but this issue is happening before I do any of that
The only thing that has jumped out at me looking at successful and failed boots in /var/log/messages is the failed ones have the systemd[1]: Reloading. before a kernel: XFS (XXX): Ending clean mount which I am guessing causes those mounts to fail

Chris-Long02 commented 3 weeks ago

I've experienced this issue with many different configurations/modifications. With the most minimal being from expanding the root disk/VG and no hardening. I have yet to have the issue occur for me on a boot directly from the SPEL ami though.

I tried to use a mount wrapper to stop race conditions, but it didn't work. Although it is very possible that there is an issue with the mount wrapper.

ferricoxide commented 3 weeks ago

Just for clarification: you were successfully using the RHEL 8 AMIs prior to the spel-minimal-rhel-8-hvm-2024.04.1.x86_64-gp3 AMI-release? Trying to eliminate the possibility that the AWS-specific driver RPMs that were added with the April AMI publishing-event aren't responsible (or contributing).

Also: do you have userData payload (or other provisioning-automation) that can be borrowed to try to replicate the problem? Otherwise, I'm kind of blind.

Chris-Long02 commented 3 weeks ago

That's correct spel-minimal-rhel-8-hvm-2024.03.1.x86_64-gp3 was the last one I used without issue.

The Steps to expand volumes from the original post is the only user data I've been using and still having the issue occur. Using a C5 instance type might provide you more luck in recreating the problem, as it's been a lot more consistent in having the issue from my recent testing.

ferricoxide commented 3 weeks ago

That's correct spel-minimal-rhel-8-hvm-2024.03.1.x86_64-gp3 was the last one I used without issue.

Bugger… Ok, did the February one work for you (spel-minimal-rhel-8-hvm-2024.02.1.x86_64-gp3). Asking because trying to bracket the two biggest likelihoods: presence of AMZN device-driver RPMs and presence of UEFI support.

The AMZN device-driver RPMs were added to April (onwards) AMIs

UEFI support was added in February's AMI, but fell out of March's first AMI-release. Our AMI builds bootstrap from Red Hat's ndash; how we "borrow" access to the RHUI repositories. However, Red Hat has adopted some weird publishing habits for the last nine months, or so, that were only made visible when we added UEFI support to our EL8 AMIs. It's why there were actually two AMIs released in March:

-------------------------------------------------------------------------------------------------------------------------
|                                                    DescribeImages                                                     |
+----------------+---------------------------+------------------------+-------------------------------------------------+
|    BootMode    |       CreationDate        |        ImageId         |                      Name                       |
+----------------+---------------------------+------------------------+-------------------------------------------------+
|  None          |  2024-03-22T18:57:02.000Z |  ami-01de2f1ca2c0255a4 |  spel-minimal-rhel-8-hvm-2024.03.1.x86_64-gp3   |
|  uefi-preferred|  2024-04-11T13:11:13.000Z |  ami-0c661fa35a091d312 |  spel-minimal-rhel-8-hvm-2024.03.2.x86_64-gp3   |
+----------------+---------------------------+------------------------+-------------------------------------------------+

AMIs published from 2024.03.2 onwards have had UEFI support baked in.

Note that after the spel-minimal-rhel-8-hvm-2024.03.2.x86_64-gp3 AMI was published, the spel-minimal-rhel-8-hvm-2024.03.1.x86_64-gp3 AMI was marked as "deprecated" …which should have removed it from search-results (but still left it available if you already knew its ID or had the ID in your automation). Any chance you could try your process with either the February AMIs or the second March AMI?

ferricoxide commented 3 weeks ago

Guess I should have verified the AWS region: the above is us-east-1. As noted previously, the 3.1 releases were marked "deprecated", so, when I do a search from my normal (GC) dev account, I get:


-------------------------------------------------------------------------------------------------------------------------
|                                                    DescribeImages                                                     |
+----------------+---------------------------+------------------------+-------------------------------------------------+
|    BootMode    |       CreationDate        |        ImageId         |                      Name                       |
+----------------+---------------------------+------------------------+-------------------------------------------------+
|  None          |  2024-01-25T17:31:53.000Z |  ami-0363cf6882daf4895 |  spel-minimal-rhel-8-hvm-2024.01.1.x86_64-gp3   |
|  uefi-preferred|  2024-02-26T16:21:12.000Z |  ami-092037daccf8526f7 |  spel-minimal-rhel-8-hvm-2024.02.1.x86_64-gp3   |
|  uefi-preferred|  2024-04-11T13:57:15.000Z |  ami-0373eef9e2b3b4bcd |  spel-minimal-rhel-8-hvm-2024.03.2.x86_64-gp3   |
|  uefi-preferred|  2024-04-23T14:00:19.000Z |  ami-0e770985d7a4b2822 |  spel-minimal-rhel-8-hvm-2024.04.1.x86_64-gp3   |
|  uefi-preferred|  2024-05-21T14:32:15.000Z |  ami-0455bbb8b742553ba |  spel-minimal-rhel-8-hvm-2024.05.1.x86_64-gp3   |
+----------------+---------------------------+------------------------+-------------------------------------------------+

At any rate, if your issues are in GC, please check whether they still manifest with the 03.2 AMI.

Chris-Long02 commented 3 weeks ago

I remember having an issue with a previous AMI, but I think it was the deprecated one. I have made images based off the 03.2 release without issue, but I'll try on the February one.

Chris-Long02 commented 3 weeks ago

Haven't had any failed launches with spel-minimal-rhel-8-hvm-2024.02.1.x86_64-gp3.

ferricoxide commented 3 weeks ago

That's correct spel-minimal-rhel-8-hvm-2024.03.1.x86_64-gp3 was the last one I used without issue.

The Steps to expand volumes from the original post is the only user data I've been using and still having the issue occur. Using a C5 instance type might provide you more luck in recreating the problem, as it's been a lot more consistent in having the issue from my recent testing.

Ok, so looking back at your userData payload:

DISK=$(lsblk -l | sort | awk '{ if ($6 == "disk") { print $1 }}' | tail -1)
DISKPART=$(lsblk -l | sort | awk '{ if ($6 == "part") { print $1 }}' | tail -1)
PARTNUM=$(echo "$DISKPART" | grep -oP "[0-9]+$")
sudo growpart /dev/$DISK $PARTNUM
sudo pvresize /dev/$DISKPART

VOL=$(lsblk -l | sort | awk '{ if ($7 == "/") { print $1 }}' | tail -1)
GROUP=$(echo $VOL | grep -oP "^[^-]*")

sudo lvextend -r -L 4G /dev/$GROUP/homeVol
sudo lvextend -r -L 8G /dev/$GROUP/varVol
sudo lvextend -r -L 6G /dev/$GROUP/logVol
sudo lvextend -r -L 10G /dev/$GROUP/auditVol
sudo lvextend -r -l +100%FREE /dev/$GROUP/rootVol

My first recommendation would be to change from using a simple, !#/bin/bash-only payload to a multipart mixed-MIME payload. Using that method will pretty much eliminate the need to do the entirety of:

DISK=$(lsblk -l | sort | awk '{ if ($6 == "disk") { print $1 }}' | tail -1)
DISKPART=$(lsblk -l | sort | awk '{ if ($6 == "part") { print $1 }}' | tail -1)
PARTNUM=$(echo "$DISKPART" | grep -oP "[0-9]+$")
sudo growpart /dev/$DISK $PARTNUM

Using a multipart mixed-MIME payload, this stanza's logic could be replaced with a #cloud-config section like:

growpart:
  mode: auto
  devices: [
      '/dev/nvme0n1',
      '/dev/nvme0n1p3',
      '/dev/nvme0n1p4',
    ]

The snippet I post above is just an "I might be using any Nitro or non-Nitro instance type to host a spel-derived OS that may or may not have UEFI-boot support" safety-list.

At any rate, having moved the EBS geometry-change logic to a cloud-config stanza, your BASH could be reduced to:

DISKPART=$(
  lsblk -l | \
  sort | \
  awk '{ if ( $6 == "part" )  { print $1 }}' | \
  tail -1
)
growpart /dev/$DISK $PARTNUM
pvresize /dev/$DISKPART

VOLPATH=$( lsblk --noheadings -l | sed -n '/ \/$/p' | cut -d " " -f 1 )
VOLGROUP="${VOLPATH//-*/}"

lvextend -r -L 4G "/dev/$VOLGROUP/homeVol"
lvextend -r -L 8G "/dev/$VOLGROUP/varVol"
lvextend -r -L 6G "/dev/$VOLGROUP/logVol"
lvextend -r -L 10G "/dev/$VOLGROUP/auditVol"
lvextend -r -l +100%FREE "/dev/$VOLGROUP/rootVol"

Ultimately, I took what you originally posted and created a multipart mixed-MIME userData payload of:

Content-Type: multipart/mixed; boundary="===============BOUNDARY=="
MIME-Version: 1.0
Number-Attachments: 2

--===============BOUNDARY==
Content-Disposition: attachment; filename="cloud.cfg"
Content-Transfer-Encoding: 7bit
Content-Type: text/cloud-config
Mime-Version: 1.0

#cloud-config

growpart:
  mode: auto
  devices: [
    '/dev/nvme0n1',
    '/dev/nvme0n1p3',
    '/dev/nvme0n1p4',
  ]

--===============BOUNDARY==
Content-Disposition: attachment; filename="userData-script.sh"
Content-Transfer-Encoding: 7bit
Content-Type: text/x-shellscript
Mime-Version: 1.0

#!/bin/bash
set -euo pipefail
set -x

# Log everything below into syslog
exec 1> >( logger -s -t "$(  basename "${0}" )" ) 2>&1

# Get dev-path containing root EBS's root-volume
DISKPART=$(
  lsblk -l | \
  sort | \
  awk '{ if ( $6 == "part" )  { print $1 }}' | \
  tail -1
)

# Make LVM resize root EBS's PV
pvresize /dev/$DISKPART

# Compute LVM root-VG name
VOLPATH="$( lsblk --noheadings -l | sed -n '/ \/$/p' | cut -d " " -f 1 )"
VOLGROUP="${VOLPATH//-*/}"

# Resize the volumes
lvextend -r -L 4G "/dev/$VOLGROUP/homeVol"
lvextend -r -L 8G "/dev/$VOLGROUP/varVol"
lvextend -r -L 6G "/dev/$VOLGROUP/logVol"
lvextend -r -L 10G "/dev/$VOLGROUP/auditVol"
lvextend -r -l +100%FREE "/dev/$VOLGROUP/rootVol"

systemctl reboot

--===============BOUNDARY==

And then, as a quick test, I used a --count 10 in my launch-request to see if/how many failures I would get when using the spel-minimal-rhel-8-hvm-2024.05.1.x86_64-gp3 AMI. I was 10-for-10 on the success front. I'll do some more batches to see if I get failures, but, for right now, I can't reproduce.

ferricoxide commented 3 weeks ago

20-for-20: all good…

ferricoxide commented 3 weeks ago

Batch of 25: all good…

Note that, what I'm doing to create these batches is:

mapfile -t INSTANCES < <(
  aws ec2 run-instances \
    --image-id ami-0455bbb8b742553ba \
    --instance-type c5.large \
    --subnet-id <PRIVATE> \
    --security-group-id <PRIVATE> \
    --iam-instance-profile 'Name=<PRIVATE>' \
    --key-name <PRIVATE> \
    --block-device-mappings 'DeviceName=/dev/sda1,Ebs={DeleteOnTermination=true,VolumeType=gp3,VolumeSize=50,Encrypted=false}'  \
    --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=Testing: GitHub Issue \
    - spel #691}]' \
    --user-data file:///tmp/userData.spel_691 \
    --count 25 \
    --query 'Instances[].InstanceId' \
    --output text  | \
  tr '\t' '\n'
)

Note 1: I'm wrapping the launch in mapfile so that I can then iteratively evaluate the resulting EC2s for expected configuration-state by doing something like:

for INSTANCE in "${INSTANCES[@]}"
do
    ssh "maintuser@${INSTANCE}" "hostname -f && df -PHt xfs | sed 's/^/    /' && echo"
done

Note 2: The account I'm testing in is limited for address space within the AZ I'm using, so "25" is about the most I can launch in one go (I'd probably soon be running up on EC2 type-limits, any way, if IP space wasn't killing me)

ferricoxide commented 3 weeks ago

Ok, looks like I can launch at least 30 into an empty subnet…

At any rate, at this point, I've batch-launched over 100 (c5.large) EC2s – that have included the volume-resizing userData payloads – and haven't been able to trigger the errors.

Chris-Long02 commented 3 weeks ago

I used the payload you provided above and I'm still getting the same result. Failed to mount /var/log/audit. I tested on a total of 50 instances (25 were t3.micros and the others were c5.larges) and they all failed.

Launching from the original AMI has still not provided me any issues, but once I create the image of the modified spel-minimal-rhel-8-hvm-2024.05.1.x86_64-gp3 it stops working.

ferricoxide commented 3 weeks ago

Unfortunately, I can't really provide guidance on what deltas between our AMI and your derived-AMI might be responsible for the issue. If your AMIs are still RHUI-enabled/entitled, you should be able to open a support request with AWS (who can rope in Red Hat's OS-support group for you). If you're not sure your AMIs are RHUI-enabled/entitled, you can do:

aws ec2 describe-images \
  --image-id <YOUR_IMAGE_ID> \
  --query 'sort_by(Images, &CreationDate)[].[UsageOperation]'

Entitled AMIs will produce output like:


[
    [
        "RunInstances:0010"
    ]
]

AMIs that lack entitlement will produce output like:

[
    [
        "RunInstances"
    ]
]

Similarly, from within a running EC2, you can execute:

curl -s http://169.254.169.254/latest/dynamic/instance-identity/document/ | \
python3 -c 'import json,sys ; print( json.load(sys.stdin)["billingProducts"] )'

If your EC2 has entitlement, you'll get back a result like:

['bp-6fa54006']

Otherwise, that command will return:

None

Chris-Long02 commented 3 weeks ago

Okay, thank you for all the help! I really appreciate it.

mrabe142 commented 3 weeks ago

Just wanted to update on what I have observed so far after a bit of testing. To clarify, I have no issues launching the AMI. I am only running into issues after I have done the initial SSH to the instance and reboot.

I spun up seven different instances:

spel-minimal-rhel-8-hvm-2024.5.1 with t2.2xlarge, 500 GB gp3, legacy-bios
spel-minimal-rhel-8-hvm-2024.2.1 with t2.2xlarge, 500 GB gp3, legacy-bios
spel-minimal-rhel-8-hvm-2024.5.1 with t3.micro and default 20 GB gp3, uefi
spel-minimal-rhel-8-hvm-2024.5.1 with t2.2xlarge, 50 GB gp3, legacy-bios
spel-minimal-rhel-8-hvm-2024.5.1 with t3.xlarge, 50 GB gp3, uefi
spel-minimal-rhel-8-hvm-2024.5.1 with t3.2xlarge, 500 GB gp3, uefi
spel-minimal-rhel-8-hvm-2024.1.1 with t3.2xlarge, 100 GB gp3, legacy-bios

The first one started to have boot volume mounting issues after the second reboot, no modifications/updates The second one was rebooting fine, did it around six times. I did a 'dnf update' and rebooted, now it has a kernel panic that it cannot mount the root files system The third one is rebooting fine, have done multiple reboots, did 'dnf update', ran more reboots and seems stable so far. The fourth one had boot volume mounting issues after first reboot, no modifications/updates The fifth one is rebooting as normal multiple times, did 'dnf update', rebooted multiple times and still working The sixth one had boot volume mounting issues after rebooting, no modifications/updates The seventh one is is rebooting as normal multiple times, did 'dnf update', rebooted multiple times and still working

In summary, 3, 5, and 7 do not exhibit reboot mounting issues (as of yet). 2 and 7 used a different version of the AMI. I have not tried to resize any volumes yet.

On comparing, I do see some differences in the devices. 1 and 4 have /dev/xvda# devices and 3 has /dev/nvme0n1p# devices. The device names change as different instance types and disk sizes are chosen.

I will continue to test after doing some resizing and hardening to see if that changes anything.

ferricoxide commented 3 weeks ago

Bleah: not much in the way of consistency in what you're describing. It's going to make reproducing significantly more difficult to do. I can re-do my tests and run random reboots against them to see if I can provoke similar behaviors, but it's unlikely to be until next week that I can set aside any time to do so.

As to:

On comparing, I do see some differences in the devices. 1 and 4 have /dev/xvda# devices and 3 has /dev/nvme0n1p# devices. The device names change as different instance types and disk sizes are chosen.

Device-node references are a function of the instance-type you selected:

/dev/xvdX means you've selected a pre-Nitro instance-type (t2, m4, etc.)
/dev/nvmeXnY means you've selected a Nitro-based instance-type (t3, c5, r6i, m7i, etc.)

Probably neither here nor there, but it's generally recommended to not use pre-Nitro instance-types if you can avoid it:

While AWS is offering/planning to offer Xen-on-Nitro (on the back-end) to prolong their ability to offer pre-Nitro instance-types, it's worth noting that that's only for EC2 (pre-Nitro for RDS is going away): no indication, yet, on how long they'll continue bothering with Xen-on-Nitro for EC2s (presumably will depend)
AWS is generally not building out (significantly) more capacity of them (if you hit an "out of capacity" error in a given AZ, it may take longer for shortages to be resolved than if using pure-Nitro instance-types): remember that pre-Nitro are 2016-vintage(?) offerings
Their overall price/performance is generally lower than more-recent offerings
Their networking-speed is significantly lower than that of more-recent generations' within a given instance family (t, r, c, etc).
Since even EBS is network-based, your disk-I/O is also more constrained
Things like serial console access are not possible (e.g., in case you want or need to do recovery tasks)

In fairness, first and second bullet are heavily supposition-based.

ferricoxide commented 3 weeks ago

@mrabe142:

So, before I started working any other tasks/projects, today, I fired up 30 EC2s in the us-gov-west-1 region (in what is mapped in my account as the "a" availability-zone):

$ mapfile -t INSTANCES < <(
    aws ec2 run-instances \
      --image-id ami-0455bbb8b742553ba \
      --instance-type t3.medium \
      --subnet-id <PRIVATE_INFO> \
      --security-group-id <PRIVATE_INFO> \
      --iam-instance-profile 'Name=<PRIVATE_INFO>' \
      --key-name <PRIVATE_INFO> \
      --block-device-mappings 'DeviceName=/dev/sda1,Ebs={DeleteOnTermination=true,VolumeType=gp3,VolumeSize=50,Encrypted=false}'  \
      --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=Testing: GitHub Issue - spel #691}]' \
      --user-data file:///tmp/userData.spel_691 \
      --count 30 \
      --query 'Instances[].InstanceId' \
      --output text | \
    tr '\t' '\n'
  )

Note that the user-data file in the above is the one whose contents I previously shared to this thread.

I then set up a simple, "do forever" loop to iterate over those 30 instances, rebooting them every two or so minutes:

while true
do
   for INSTANCE in "${INSTANCES[@]}"
   do
      timeout 5 ssh "maintuser@${INSTANCE}" "sudo systemctl reboot"
      echo
   done
   sleep 120
done

And let that run for about an hour. Over the course of that hour-long iteration:

Each instance was rebooted 20+ times
Every instance returned from each of their respective reboots

In short, I can't reproduce the issue with the configuration information relayed to me to date. I either need more in the way of "how do I reproduce your prolem" input, or I'm at the end of what I can think to do as troubleshooting steps.

ferricoxide commented 3 weeks ago

Just a heads up: if I don't see any further comments from either @Chris-Long02 or @mrabe142 before Wednesday next (26 June 2025), I'm going to close this issue. If either of you is able to provide me more information on reproducing your respective problems, I can either continue trying to resolve this ticket or, if that info comes in after closing, I'll re-open.

mrabe142 commented 2 weeks ago

Just wanted to provide an update. I am not current blocked on this so there is no urgent need for me, you can close it unless you want to investigate further.

I did another round of testing. I tried 5 different VMs each with these configurations:

t3.2xlarge
100 GB gp3 EBS storage

I used the following 5 AMIs with the above configuration:

spel-minimal-rhel-8-hvm-2024.01.1.x86_64-gp3
spel-minimal-rhel-8-hvm-2024.02.1.x86_64-gp3
spel-minimal-rhel-8-hvm-2024.03.2.x86_64-gp3
spel-minimal-rhel-8-hvm-2024.05.1.x86_64-gp3
spel-minimal-rhel-8-hvm-2024.06.1.x86_64-gp3

This is what I found for each configuration:

I was able to use this AMI without any issues. Since it is from January, it uses Legacy BIOS. I was able to do a full 'dnf update' to bring all packages up to date. I was able to extend the LVM sizes and fully STIG the VM.
This AMI did not work, it has a kernel panic after 'dnf update' and rebooting.
I was able to use this AMI. It is in the UEFI configuration. I was able to do a full 'dnf update' to bring all packages up to date. I was able to extend the LVM sizes and fully STIG the VM.
This AMI has the booting issues, especially after a 'dnf update'. Sometimes does not manifest until after trying to extend an LVM mount.
This AMI has the booting issues

Since I am able to use configurations 1 and 3, I am able to continue with what I need to do. All VMs are using the same CPU/Mem/EBS configurations and all were updated to the latest packages so it is not clear to me what is causing the issues with the newer AMIs.

ferricoxide commented 2 weeks ago

@mrabe142 sed:

Since I am able to use configurations 1 and 3, I am able to continue with what I need to do. All VMs are using the same CPU/Mem/EBS configurations and all were updated to the latest packages so it is not clear to me what is causing the issues with the newer AMIs.

I can investigate further: are all of the above Nitro instance-types? Also, do you have specific, consistent automation that you're running that can be invoked to try to provoke the issues?

mrabe142 commented 2 weeks ago

I think the t3 instance types are nitro capable but I don't think I selected anything about nitro when I launched them, I just launch on-demand instances from the Launch Instances menu of EC2->Instances page of the AWS GovCloud Console. The only things I set is the name, AMI, instance type (t3.2xlarge), choose my public key, put into VPC and subnet that was provided for me, apply security group that was provided for me, set storage size to 100GB gp3 (non-encrypted). The rest of the settings are left as default.

I am not running any automation at this point to do the initial setup, when the instance comes up, I SSH to it. The first thing I try is a dnf update and reboot. I do reboot a couple times to see if the boot issue comes up. If it continues to come up after a couple reboots without errors, I try to extend the LVM mounts. Depending on the instance types, I do these:

For the spel-minimal-rhel-8-hvm-2024.01.1.x86_64-gp3 AMI (only has 2 partitions):

sudo growpart --free-percent=50 /dev/nvme0n1 2
sudo lvm lvresize --size=+4G /dev/mapper/RootVG-rootVol
sudo xfs_growfs /dev/mapper/RootVG-rootVol
sudo lvm lvresize --size=+8G /dev/mapper/RootVG-varVol
sudo xfs_growfs /dev/mapper/RootVG-varVol
sudo lvm lvresize --size=+6G /dev/mapper/RootVG-auditVol 
sudo xfs_growfs /dev/mapper/RootVG-auditVol

For the other AMIs that have 4 partitions:

sudo growpart --free-percent=50 /dev/nvme0n1 4
sudo lvm lvresize --size=+4G /dev/mapper/RootVG-rootVol
sudo xfs_growfs /dev/mapper/RootVG-rootVol
sudo lvm lvresize --size=+8G /dev/mapper/RootVG-varVol
sudo xfs_growfs /dev/mapper/RootVG-varVol
sudo lvm lvresize --size=+6G /dev/mapper/RootVG-auditVol 
sudo xfs_growfs /dev/mapper/RootVG-auditVol

I try rebooting again after applying those. If they reboot a couple times without boot errors, they are usually stable at that point for all the rest of the configuration I apply to them.

ferricoxide commented 2 weeks ago

Continuing conversation with @mrabe142 in (new) issue #695: May be a separate issue and probably don't need to spam @Chris-Long02 with stuff that may not be relevant. Will re-link if anything turns up in new ticket.

plus3it / spel