kairos-io / kairos

:penguin: The immutable Linux meta-distribution for edge Kubernetes.
https://kairos.io
Apache License 2.0
1.12k stars 97 forks source link

Custom partitioning with `no-format` doesn't work #2281

Closed sarg3nt closed 6 months ago

sarg3nt commented 8 months ago

Kairos version: /kairos/rockylinux:9-core-amd64-generic-v2.4.3

NAME="Rocky Linux"
VERSION="9.3 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.3"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.3 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.3"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.3"
KAIROS_NAME="kairos-core-rockylinux-9"
KAIROS_VERSION="v2.4.3"
KAIROS_ID="kairos"
KAIROS_ID_LIKE="kairos-core-rockylinux-9"
KAIROS_VERSION_ID="v2.4.3"
KAIROS_PRETTY_NAME="kairos-core-rockylinux-9 v2.4.3"
KAIROS_BUG_REPORT_URL="https://github.com/kairos-io/kairos/issues"
KAIROS_HOME_URL="https://github.com/kairos-io/kairos"
KAIROS_IMAGE_REPO="quay.io/kairos/rockylinux"
KAIROS_IMAGE_LABEL="9-core-amd64-generic-v2.4.3"
KAIROS_GITHUB_REPO="kairos-io/kairos"
KAIROS_VARIANT="core"
KAIROS_FLAVOR="rockylinux"
KAIROS_ARTIFACT="kairos-rockylinux-9-core-amd64-generic-v2.4.3"

CPU architecture, OS, and Version:

Linux lpul-vault-k8s-agent-2.vault.ad.selinc.com 5.14.0-362.8.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov 8 17:36:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug

To Reproduce

Expected behavior Larger volumes than the boot volume should be supported.
This may require fixing the device bug as mentioned in https://github.com/kairos-io/kairos/issues/2243

Logs Have not been able to attain logs due to failure.

Additional context

The cloud_init.yaml file for a custom formatted disk resulting in a blank screen after install.

strict: true
debug: true
install:
  no-format: true
  auto: true
  poweroff: false
  reboot: true
users:
  # The kairos user is configured in the target nodes terraform
  - name: "kairos-auroraboot"
    passwd: "${password}"
stages:
  kairos-install.pre.before:
  - if:  '[ -e "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" ]'
    name: "Create partitions"
    commands:
      - |
        parted --script --machine -- "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" mklabel msdos
    layout:
      device:
        path: "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
      expand_partition:
        size: 0 # All available space
      add_partitions:
        # all sizes bellow are in MB
        - fsLabel: COS_OEM
          size: 64
          pLabel: oem
        - fsLabel: COS_RECOVERY
          size: 8500
          pLabel: recovery
        - fsLabel: COS_STATE
          size: 18000
          pLabel: state
        - fsLabel: COS_PERSISTENT
          pLabel: persistent
          size: 25000
          filesystem: "ext4"
  boot:
    - systemd_firstboot:
      keymap: us
    - name: "Environment Variables"
      environment:
        HTTP_PROXY: "http://wall.ad.selinc.com:8080"
        HTTPS_PROXY: "http://wall.ad.selinc.com:8080"
        http_proxy: "http://wall.ad.selinc.com:8080"
        https_proxy: "http://wall.ad.selinc.com:8080"
        NO_PROXY:  "<redacted for brevity>"
        no_proxy: "<redacted for brevity>"
    - name: "Setup services"
      systemctl:
        disable:
          - dnf-makecache
    - name: "Setup NTP"
      systemctl:
        enable:
          - systemd-timesyncd
      timesyncd:
        NTP: "<redacted>"
        FallbackNTP: ""

Even more stripped down YAML file without custom formatted disk resulting in an install loop.

strict: true
debug: true
install:
  device: "auto"
  auto: true
  poweroff: false
  reboot: true
users:
  - name: "kairos-auroraboot"
    passwd: "${password}"

As stated above, if I set device: /dev/sda it will work with some of the nodes and boot lock on others, which is not acceptable.

jimmykarily commented 8 months ago

@sarg3nt thanks for all your debugging efforts.

I've been using a cloud-config similar to the stripped one you sent all the time with no problems. The only difference is that I almost never have a second disk attached. If that's the problem it should be easy to reproduce in qemu with 2 disks.

I'd suggest you take the minimum config that reproduces the problem (the last one you sent), remove the reboot: true so that you can grab the installation logs for inspection. We should be able to see if any errors were logged and which device the installation was performed onto. Running kairos-agent manual-install --device auto config.yaml would be even better in terms of log collection.

Also, the system boots from an ISO right? And there is a reboot: true there which will make the system reboot after installation. Do you have the boot order correctly set so that the system doesn't boot from the cdrom again?

sarg3nt commented 8 months ago

@jimmykarily I'm using AuroraBoot , so not booting from a CD ROM

I think I've somewhat figured some of this out.

The root problem ties back to ttps://github.com/kairos-io/kairos/issues/2243

Even though I'm using no-format: true and creating the Kairos disks manually device: auto or device: /dev/sda is taking priority and breaking it.

Example: This is with device: auto

lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0    7:0    0 528.5M  1 loop /run/rootfsbase
sda      8:0    0    80G  0 disk
sdb      8:16   0   120G  0 disk
├─sdb1   8:17   0     1M  0 part
├─sdb2   8:18   0    64M  0 part
├─sdb3   8:19   0   3.1G  0 part
├─sdb4   8:20   0   5.3G  0 part
└─sdb5   8:21   0 111.6G  0 part

Where sdb is my larger second disk`

Here's my yaml file where I manually format the partitions.

strict: true
# enable debug logging
debug: true
install:
  no-format: true
  auto: true
  poweroff: false
  reboot: false
  grub_options:
    extra_cmdline: "rd.immucore.debug"
users:
  - name: "kairos"
    passwd: "kairos"
stages:
  kairos-install.pre.before:
  - if:  '[ -e "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" ]'
    name: "Create partitions"
    commands:
      - |
        parted --script --machine -- "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" mklabel msdos
    layout:
      device:
        path: "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
      expand_partition:
        size: 0 # All available space
      add_partitions:
        # all sizes bellow are in MB
        - fsLabel: COS_OEM
          size: 64
          pLabel: oem
        - fsLabel: COS_RECOVERY
          size: 8500
          pLabel: recovery
        - fsLabel: COS_STATE
          size: 18000
          pLabel: state
        - fsLabel: COS_PERSISTENT
          pLabel: persistent
          size: 25000
          filesystem: "ext4"
  boot:
    - systemd_firstboot:
      keymap: us

So . .even though I'm specifically setting path: "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" Kairos appears to be ignoring it and installing on whatever disk it wants

Here's some troubleshooting output so you get a lay of the land.

[root@lpul-vault-k8s-server-0 kairos]# blkid
/dev/loop0: TYPE="squashfs"
/dev/sdb4: LABEL="COS_STATE" UUID="b78e5bd5-98a3-45aa-a18e-5ece1438520c" TYPE="ext4" PARTLABEL="state" PARTUUID="1b57e20b-aa44-4e92-99f1-72f8b2340b27"
/dev/sdb2: LABEL="COS_OEM" UUID="228275ab-46cf-4553-ab84-082530123da1" TYPE="ext4" PARTLABEL="oem" PARTUUID="fae50aaf-d3c1-4e3e-a015-2a2da703d5b8"
/dev/sdb5: LABEL="COS_PERSISTENT" UUID="8ada0826-86b6-44b4-a665-bb6fd22bcfd9" TYPE="ext4" PARTLABEL="persistent" PARTUUID="75c0a4cf-2cdc-4642-9507-a36a837044ed"
/dev/sdb3: LABEL="COS_RECOVERY" UUID="3f1d509d-ab4c-4f13-961e-33f19bdc5d46" TYPE="ext4" PARTLABEL="recovery" PARTUUID="2ff71b29-d20a-452f-995c-a89612f49592"
/dev/sdb1: PARTLABEL="bios" PARTUUID="ce2cf7b1-ecc1-45da-9aa1-c01203ee332d"
/dev/sda: PTUUID="739d1a81" PTTYPE="dos"
[root@lpul-vault-k8s-server-0 kairos]#
[root@lpul-vault-k8s-server-0 kairos]#
[root@lpul-vault-k8s-server-0 kairos]# blkid /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0: PTUUID="739d1a81" PTTYPE="dos"
[root@lpul-vault-k8s-server-0 kairos]# blkid /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0: PTUUID="77275a43-53c5-4fdf-a62d-47f90365a745" PTTYPE="gpt"
[root@lpul-vault-k8s-server-0 kairos]# ll
bash: ll: command not found
[root@lpul-vault-k8s-server-0 kairos]# ls -alh /dev/disk/by-path
total 0
drwxr-xr-x 2 root root 180 Feb 22 19:29 .
drwxr-xr-x 8 root root 160 Feb 22 19:29 ..
lrwxrwxrwx 1 root root   9 Feb 22 19:29 pci-0000:03:00.0-scsi-0:0:0:0 -> ../../sda
lrwxrwxrwx 1 root root   9 Feb 22 19:29 pci-0000:03:00.0-scsi-0:0:1:0 -> ../../sdb
lrwxrwxrwx 1 root root  10 Feb 22 19:29 pci-0000:03:00.0-scsi-0:0:1:0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  10 Feb 22 19:29 pci-0000:03:00.0-scsi-0:0:1:0-part2 -> ../../sdb2
lrwxrwxrwx 1 root root  10 Feb 22 19:29 pci-0000:03:00.0-scsi-0:0:1:0-part3 -> ../../sdb3
lrwxrwxrwx 1 root root  10 Feb 22 19:29 pci-0000:03:00.0-scsi-0:0:1:0-part4 -> ../../sdb4
lrwxrwxrwx 1 root root  10 Feb 22 19:29 pci-0000:03:00.0-scsi-0:0:1:0-part5 -> ../../sdb5
[root@lpul-vault-k8s-server-0 kairos]#

As you can see pci-0000:03:00.0-scsi-0:0:0:0 is the first disk, is sda, is the disk I told Kairos to partition and is ignored even though I have no-format: true set. It installed on pci-0000:03:00.0-scsi-0:0:1:0 which is disk 2.

I'm not sure what logs you want me to give you.
Here are the ones I know about.

journalctl -u kairos

Feb 22 19:29:03 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: Starting kairos installer...

/run/immucore/immucore.log

2024-02-22T19:28:53Z INF Immucore commit=none compiled with=go1.20.2 version=v0.1.6
2024-02-22T19:28:53Z INF Stanza rd.cos.disable/rd.immucore.disable on the cmdline or booting from CDROM/Netboot/Squash recovery. Disabling immucore.
2024-02-22T19:28:53Z INF 1.
 <init> (background: false) (weak: false) (run: false)
2.
 <create-sentinel> (background: false) (weak: false) (run: false)
 <wait-for-sysroot> (background: false) (weak: false) (run: false)
3.
 <mount-oem> (background: false) (weak: false) (run: false)
4.
 <rootfs-hook> (background: false) (weak: false) (run: false)
5.
 <initramfs-hook> (background: false) (weak: false) (run: false)

2024-02-22T19:28:53Z INF Setting sentinel file to=live_mode
2024-02-22T19:28:59Z INF Running rootfs stage
2024-02-22T19:29:01Z INF Running initramfs stage
2024-02-22T19:29:02Z INF 1.
 <init> (background: false) (weak: false) (run: false)
2.
 <create-sentinel> (background: false) (weak: false) (run: true)
 <wait-for-sysroot> (background: false) (weak: false) (run: true)
3.
 <mount-oem> (background: false) (weak: false) (run: false)
4.
 <rootfs-hook> (background: false) (weak: false) (run: true)
5.
 <initramfs-hook> (background: false) (weak: false) (run: true)

/run/immucore/initramfs_stage.log

2024-02-22T19:29:01Z INF Running stage: initramfs.before

2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f /oem/userdata ]: exit status 1)' stage name: Pull data from provider
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e /sbin/openrc ]: exit status 1)' stage name: Blacklist bpfilter on Alpine ( bug: https://github.com/kairos-io/kairos/issues/277 )
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run ! [[ -f /etc/hosts ]] || ! [[ $(grep '127.0.0.1' /etc/hosts) ]]
: exit status 1)' stage name: Make sure hosts file is present and includes a record for 127.0.0.1
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f /oem/userdata ]: exit status 1)' stage name:
2024-02-22T19:29:01Z INF Done executing stage 'initramfs.before'

2024-02-22T19:29:01Z INF Running stage: initramfs

2024-02-22T19:29:01Z INF Processing stage step 'Enable systemd-network config files for DHCP'. ( commands: 1, files: 2, ... )
2024-02-22T19:29:01Z INF Processing stage step 'Create journalctl /var/log/journal dir'. ( commands: 0, files: 0, ... )
2024-02-22T19:29:01Z INF Processing stage step 'systemd-sysext initramfs settings'. ( commands: 0, files: 0, ... )
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.remote_recovery_mode" /proc/cmdline && \
( [ -e "/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] || [ -e "/usr/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] )
: exit status 1)' stage name: Starts kairos-recovery and generate a temporary pass
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/sbin/openrc" ]
: exit status 1)' stage name: Create OpenRC services
2024-02-22T19:29:01Z INF Processing stage step ''. ( commands: 1, files: 0, ... )
2024-02-22T19:29:01Z ERR Failed to connect system bus: No such file or directory
: failed to run networkctl reload: exit status 1
2024-02-22T19:29:01Z ERR 1 error occurred:
        * failed to run networkctl reload: exit status 1

2024-02-22T19:29:01Z INF Command output: Created symlink /etc/systemd/system/multi-user.target.wants/kairos-agent.service → /etc/systemd/system/kairos-agent.service.

2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.remote_recovery_mode" /proc/cmdline &&  [ -f "/sbin/openrc" ]: exit status 1)' stage name: Starts kairos-recovery for openRC based systems
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f "/run/cos/recovery_mode" ] && [ ! -f "/run/cos/live_mode" ]: exit status 1)' stage name:
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f "/run/cos/recovery_mode" ] && [ -s /usr/local/etc/machine-id ]: exit status 1)' stage name: Restore /etc/machine-id for systemd systems
2024-02-22T19:29:01Z INF Processing stage step 'Disable NetworkManager and wicked'. ( commands: 0, files: 0, ... )
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/sbin/openrc" ]
: exit status 1)' stage name: Enable OpenRC services
2024-02-22T19:29:01Z INF Processing stage step ''. ( commands: 0, files: 2, ... )
2024-02-22T19:29:01Z ERR 2 errors occurred:
        * failed to run systemctl disable NetworkManager: exit status 1
        * failed to run systemctl disable wicked: exit status 1

2024-02-22T19:29:01Z INF Processing stage step 'Enable systemd-network and systemd-resolved'. ( commands: 0, files: 0, ... )
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f "/run/cos/recovery_mode" ] && [ -f "/sbin/openrc" ]: exit status 1)' stage name: Restore /etc/machine-id for openrc systems
2024-02-22T19:29:01Z INF Processing stage step 'Default systemd config'. ( commands: 1, files: 0, ... )
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.reset" /proc/cmdline && [ ! -f "/sbin/openrc" ]: exit status 1)' stage name: Starts kairos-reset for systemd based systems
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -qv "interactive-install" /proc/cmdline ] && \
[ -f /run/cos/live_mode ] && \
[ -f "/sbin/openrc" ]
: exit status 1)' stage name: Autologin on livecd for OpenRC
2024-02-22T19:29:01Z INF Command output: Created symlink /etc/systemd/system/default.target → /usr/lib/systemd/system/multi-user.target.

2024-02-22T19:29:01Z ERR 6 errors occurred:
        * failed to run systemctl enable iscsid: exit status 1
        * failed to run systemctl enable systemd-timesyncd: exit status 1
        * failed to run systemctl enable nohang: exit status 1
        * failed to run systemctl enable nohang-desktop: exit status 1
        * failed to run systemctl enable fail2ban: exit status 1
        * failed to run systemctl enable logrotate.timer: exit status 1

2024-02-22T19:29:01Z INF Processing stage step 'Generate host keys'. ( commands: 1, files: 0, ... )
2024-02-22T19:29:01Z INF Processing stage step 'Link /etc/resolv.conf to systemd resolv.conf'. ( commands: 2, files: 0, ... )
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.reset" /proc/cmdline && [ -f "/sbin/openrc" ]: exit status 1)' stage name: Starts kairos-reset for openRC-based systems
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run cat /proc/cmdline | grep "selinux=1"
: exit status 1)' stage name: Relabelling
2024-02-22T19:29:01Z INF Command output:
2024-02-22T19:29:01Z INF Command output:
2024-02-22T19:29:02Z INF Command output: ssh-keygen: generating new host keys: RSA DSA ECDSA ED25519

2024-02-22T19:29:02Z INF Processing stage step 'Create systemd services'. ( commands: 0, files: 5, ... )
2024-02-22T19:29:02Z INF Processing stage step ''. ( commands: 5, files: 0, ... )
2024-02-22T19:29:02Z INF Command output: Removed "/etc/systemd/system/getty.target.wants/getty@tty1.service".

2024-02-22T19:29:02Z INF Command output: Running in chroot, ignoring command 'stop'

2024-02-22T19:29:02Z INF Command output: Created symlink /etc/systemd/system/getty@tty1.service → /dev/null.

2024-02-22T19:29:02Z INF Command output: Created symlink /etc/systemd/system/multi-user.target.wants/kairos.service → /etc/systemd/system/kairos.service.

2024-02-22T19:29:02Z INF Command output: Created symlink /etc/systemd/system/multi-user.target.wants/kairos-webui.service → /etc/systemd/system/kairos-webui.service.

2024-02-22T19:29:02Z INF Processing stage step 'Enable systemd services'. ( commands: 4, files: 0, ... )
2024-02-22T19:29:02Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "nodepair.enable" /proc/cmdline && [ -f "/sbin/openrc" ]: exit status 1)' stage name:
2024-02-22T19:29:02Z INF Command output:
2024-02-22T19:29:02Z INF Command output:
2024-02-22T19:29:02Z INF Command output:
2024-02-22T19:29:02Z INF Command output:
2024-02-22T19:29:02Z INF Processing stage step 'Setup groups'. ( commands: 0, files: 0, ... )
2024-02-22T19:29:02Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "interactive-install" /proc/cmdline && \
( [ -e "/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] || [ -e "/usr/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] )
: exit status 1)' stage name:
2024-02-22T19:29:02Z INF Processing stage step 'Setup users'. ( commands: 0, files: 0, ... )
2024-02-22T19:29:02Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "interactive-install" /proc/cmdline && [ -f "/sbin/openrc" ]: exit status 1)' stage name:
2024-02-22T19:29:02Z INF Processing stage step 'Set user password if running in live or uki'. ( commands: 0, files: 0, ... )
2024-02-22T19:29:02Z INF Processing stage step 'Setup sudo'. ( commands: 1, files: 1, ... )
2024-02-22T19:29:02Z INF Command output: Locking password for user root.
passwd: Success

2024-02-22T19:29:02Z INF Processing stage step 'Ensure runtime permission'. ( commands: 2, files: 0, ... )
2024-02-22T19:29:02Z INF Command output:
2024-02-22T19:29:02Z INF Command output:
2024-02-22T19:29:02Z INF Processing stage step ''. ( commands: 0, files: 0, ... )
2024-02-22T19:29:02Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e "/usr/local/cloud-config" ]: exit status 1)' stage name: Ensure runtime permission
2024-02-22T19:29:02Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/sys/firmware/devicetree/base/model" ] && grep -i jetson "/sys/firmware/devicetree/base/model"
: exit status 1)' stage name: Create files
2024-02-22T19:29:02Z INF Processing stage step ''. ( commands: 0, files: 0, ... )
2024-02-22T19:29:02Z INF Processing stage step 'Set hostname'. ( commands: 0, files: 0, ... )
2024-02-22T19:29:02Z INF Done executing stage 'initramfs'

2024-02-22T19:29:02Z INF Running stage: initramfs.after

2024-02-22T19:29:02Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e /sbin/openrc ]: exit status 1)' stage name: Enable serial login for alpine
2024-02-22T19:29:02Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [[ $(kairos-agent state get kairos.flavor) =~ ^ubuntu ]]: exit status 1)' stage name: setupcon initramfs.after ubuntu
2024-02-22T19:29:02Z INF Done executing stage 'initramfs.after'

2024-02-22T19:29:02Z INF Running stage: initramfs.before

2024-02-22T19:29:02Z INF Done executing stage 'initramfs.before'

2024-02-22T19:29:02Z INF Running stage: initramfs

2024-02-22T19:29:02Z INF Done executing stage 'initramfs'

2024-02-22T19:29:02Z INF Running stage: initramfs.after

2024-02-22T19:29:02Z INF Done executing stage 'initramfs.after'

/run/immucore/rootfs_stage.log

2024-02-22T19:28:59Z INF Running stage: rootfs.before

2024-02-22T19:28:59Z INF Processing stage step 'Enable systemd-network config files for DHCP'. ( commands: 1, files: 2, ... )
2024-02-22T19:28:59Z INF Processing stage step 'Pull data from provider'. ( commands: 0, files: 0, ... )
2024-02-22T19:28:59Z ERR mkdir /etc/systemd/network/: file exists
2024-02-22T19:28:59Z ERR 1 error occurred:
        * mkdir /etc/systemd/network/: file exists

2024-02-22T19:28:59Z ERR Failed to connect system bus: No such file or directory
: failed to run networkctl reload: exit status 1
2024-02-22T19:28:59Z ERR 1 error occurred:
        * failed to run networkctl reload: exit status 1

2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f /oem/userdata ]: exit status 1)' stage name: Sentinel file for userdata
2024-02-22T19:29:01Z INF Done executing stage 'rootfs.before'

2024-02-22T19:29:01Z INF Running stage: rootfs

2024-02-22T19:29:01Z INF Processing stage step 'Layout configuration for active/passive mode'. ( commands: 0, files: 0, ... )
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/run/cos/recovery_mode" ]: exit status 1)' stage name: Layout configuration for recovery mode
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.boot_live_mode" /proc/cmdline: exit status 1)' stage name: Layout configuration for booting local node from livecd
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e "/run/cos/uki_boot_mode" ]: exit status 1)' stage name: Layout configuration for UKI boot
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e "/run/cos/uki_install_mode" ]: exit status 1)' stage name: Layout configuration for UKI installer
2024-02-22T19:29:01Z INF Done executing stage 'rootfs'

2024-02-22T19:29:01Z INF Running stage: rootfs.after

2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f /run/cos/recovery_mode ] && [ ! -f /run/cos/live_mode ] && [ -f "/sys/firmware/devicetree/base/model" ] && grep -i "Raspberry Pi 4" "/sys/firmware/devicetree/base/model": exit status 1)' stage name: Grow persistent
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -r /run/cos/custom-layout.env ] && [ ! -f "/run/cos/recovery_mode" ] && [ ! -f /run/cos/live_mode ]: exit status 1)' stage name: add custom bind and ephemeral mounts to /run/cos/cos-layout.env
2024-02-22T19:29:01Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f /run/cos/recovery_mode ] && [ ! -f /run/cos/live_mode ]: exit status 1)' stage name: Grow persistent
2024-02-22T19:29:01Z INF Done executing stage 'rootfs.after'

2024-02-22T19:29:01Z INF Running stage: rootfs.before

2024-02-22T19:29:01Z INF Done executing stage 'rootfs.before'

2024-02-22T19:29:01Z INF Running stage: rootfs

2024-02-22T19:29:01Z INF Done executing stage 'rootfs'

2024-02-22T19:29:01Z INF Running stage: rootfs.after

2024-02-22T19:29:01Z INF Done executing stage 'rootfs.after'

Please help. This isn't stable and usable with multiple disks which is a requirement for us. Thank you.

jimmykarily commented 8 months ago

First of all, just to get it out of the equation, in your cloud config above, the #cloud-config header is missing but I assume it's just a copy-paste mistake otherwise you wouldn't even see the disk being partitioned.

That said, I can verify that no-format doesn't work as expected. I created a VM in qemu with 2 disks:

Kairos would automatically pick /dev/vda either because it's the "first" disk or because it's the bigger disk. In any case, my goal is to point the installation to a manually partitioned /dev/vdb.

I used this config:

#cloud-config

strict: true
debug: true

install:
  no-format: true
  auto: true
  poweroff: false
  reboot: false
  grub_options:
    extra_cmdline: "rd.immucore.debug"
users:
  - name: "kairos"
    passwd: "kairos"

stages:
  kairos-install.pre.before:
  - if:  '[ -e "/dev/vdb" ]'
    name: "Create partitions"
    commands:
      - |
        parted --script --machine -- "/dev/vdb" mklabel msdos
        sgdisk -g /dev/vdb
    layout:
      device:
        path: "/dev/vdb"
      expand_partition:
        size: 0 # All available space
      add_partitions:
        # all sizes bellow are in MB
        - pLabel: bios
          size: 1
          pType: gpt
        - fsLabel: COS_OEM
          size: 64
          pLabel: oem
        - fsLabel: COS_RECOVERY
          size: 8500
          pLabel: recovery
        - fsLabel: COS_STATE
          size: 18000
          pLabel: state
        - fsLabel: COS_PERSISTENT
          pLabel: persistent
          size: 0
          filesystem: "ext4"
  boot:
    - systemd_firstboot:
      keymap: us

which is almost similar to @sarg3nt 's config but pointing to /dev/vdb (plus a sgdisk -g /dev/vdb command to fix an error about the disk being MBR and not GPT).

I compiled a kairos-agent with additional output and it seems that this line overwrites my NoFormat option: https://github.com/kairos-io/kairos-agent/blob/2e9c85e63acf926ab9e0a00b3dabff4927c70c4b/internal/agent/install.go#L270 :

installSpec.NoFormat = true
c.Install.NoFormat = false

I tried to comment it out to see what happens and indeed /dev/vda is not formated or partitioned but it's still selected as the target:

i.spec.Target = /dev/vda

(printed at this point: https://github.com/kairos-io/kairos-agent/blob/2e9c85e63acf926ab9e0a00b3dabff4927c70c4b/pkg/action/install.go#L164)

and it later fails with:

^[[36mINFO^[[0m[2024-02-23T08:30:34Z] Installing GRUB..
^[[37mDEBU^[[0m[2024-02-23T08:30:34Z] Running grub with the following args: [--root-directory=/run/cos/active --boot-directory=/run/cos/state --target=i386-pc /dev/vda]
^[[37mDEBU^[[0m[2024-02-23T08:30:34Z] Running cmd: '/usr/sbin/grub2-install --root-directory=/run/cos/active --boot-directory=/run/cos/state --target=i386-pc /dev/vda'
^[[31mERRO^[[0m[2024-02-23T08:30:38Z] Installing for i386-pc platform.
/usr/sbin/grub2-install: error: unable to identify a filesystem in hostdisk//dev/vda; safety check can't be performed.

where it obviously tries to install grub on /dev/vda which is not even partitioned.

jimmykarily commented 8 months ago

The selection of the target device doesn't take "NoFormat" into account: https://github.com/kairos-io/kairos-agent/blob/2e9c85e63acf926ab9e0a00b3dabff4927c70c4b/internal/agent/install.go#L216-L218

I think when NoFormat is set to true, the target device should be discovered using labels (ideally with a sanity check that all needed partitions are there).

jimmykarily commented 8 months ago

I found the offending parts of the code here: https://github.com/kairos-io/kairos-agent/pull/235

Needs a proper fix.

sarg3nt commented 8 months ago

@jimmykarily Thank you for looking into this and finding the problem. Any idea when this will get on the docket for a proper fix?

jimmykarily commented 8 months ago

@jimmykarily Thank you for looking into this and finding the problem. Any idea when this will get on the docket for a proper fix?

I can't make predictions, sorry. With the focus being on v3.0.0 and the UKI work, this only made it below the waterline this sprint. If things go well, we may be able to start on it :shrug:

jimmykarily commented 8 months ago

Peg PR to allow creating more than one disk on a test VM: https://github.com/spectrocloud/peg/pull/23 (will be used to implement a test for this ticket)

sarg3nt commented 7 months ago

@jimmykarily Now that Kairos v3 is out and looks fairly stabilized, do you have an estimate as to when this is going to be fixed? Thanks!

jimmykarily commented 6 months ago

Waiting for this to be merged: https://github.com/kairos-io/kairos/pull/2291 . We also need to make sure this is properly documented.

sarg3nt commented 6 months ago

@jimmykarily I've been trying to get this to work with the master build kairos/rockylinux:9-core-amd64-generic-master most of the day and am not having any luck.
Here's all the different things I tried and how they failed. Starting with the cloud_init.yaml I initially posted above I get this. image

As per one of your posts I then tried adding the sgdisk command.

    commands:
      - |
        parted --script --machine -- "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" mklabel msdos
        sgdisk -g "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"

When I do that I get the following: image But then fails with image

I saw somewhere else that added

      add_partitions:
        # all sizes bellow are in MB
        - fsLabel: COS_GRUB
          size: 64
          pLabel: bios # or efi, tried both
          filesystem: "fat"
        - fsLabel: COS_OEM
          size: 64
<snip>

But that didn't help or change anything.

What am I missing? I'm not a grub master so kind of clutching at straws here.

jimmykarily commented 6 months ago

I was struggling to find the right combination too. I ended up doing this: https://github.com/kairos-io/kairos/pull/2291/files#diff-1ff1699e612ac7f8c508e5f9f6a784b37441b01b8cfdebd8da3b068280385247R115 for legacy bios mode (see how the COS_GRUB partition is commented out some lines below).

For EFI what worked for me, was to comment out the sgdisk command and uncomment the COS_GRUB part.

To avoid trying things blindinly, what I did was, I left kairos-agent install on automatically on the default disk. Then I save the partition scheme and tried to replicate it manually but pointing to the other disk. This way you'll know what partitions kairos-agent expects.

sarg3nt commented 6 months ago

@jimmykarily I got this working but with an unexpected necessity that is kind of a worry. I'm having to specify device: /dev/sda or it install loops. That doesn't fill me with confidence since /dev/sda can flip around from boot to boot. I didn't try device: auto I had just left it out as you showed in your example. I was assuming that was auto? I got a cluster built and all the nodes worked with the install disk being the smallest, so that is progress, just still worried about the device statement.

We are wanting to go production with the first cluster soon and I need to ensure my team this is going to be stable. What do you think?

Here's my config.

strict: true
debug: true
install:
  no-format: true
  device: /dev/sda
  auto: true
  poweroff: false
  reboot: true
  grub_options:
    extra_cmdline: "rd.immucore.debug"
  bind_mounts:
  - /run/k3s
stages:
  kairos-install.pre.before:
  - if:  '[ -e "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" ]'
    name: "Create partitions"
    commands:
      - |
        parted --script --machine -- "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" mklabel gpt
        # Legacy bios
        sgdisk --new=1:2048:+1M --change-name=1:'bios' --typecode=1:EF02 "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
    layout:
      device:
        path: "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
      expand_partition:
        size: 0 # All available space
      add_partitions:
        # all sizes bellow are in MB
        - fsLabel: COS_OEM
          size: 64
          pLabel: oem
        - fsLabel: COS_RECOVERY
          size: 8500
          pLabel: recovery
        - fsLabel: COS_STATE
          size: 18000
          pLabel: state
        - fsLabel: COS_PERSISTENT
          pLabel: persistent
          size: 0
          filesystem: "ext4"
jimmykarily commented 6 months ago

@sarg3nt do you have the kairos-agent installation logs (with debug enabled) from the case when device is not set? Looking at the code, this should let kairos-agent auto detect the target device and it should print this text: https://github.com/kairos-io/kairos-agent/blob/979c4ad32b7a9eceadde33f728bd1f7c427daae0/pkg/action/install.go#L162

sarg3nt commented 6 months ago

@jimmykarily I don't know if I'm doing this right but I'm giving it my best shot. Here's what I've tried and "figured out" so far. When I have device: /dev/sda set, it works. Even when the node reboots and sda is pointing at the wrong disk it still works. See the output of lsblk below for an example.

NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop0    7:0    0    1G  1 loop /
sda      8:0    0  150G  0 disk /run/k3s
sdb      8:16   0   80G  0 disk
├─sdb1   8:17   0    1M  0 part
├─sdb2   8:18   0   64M  0 part /oem
├─sdb3   8:19   0  2.2G  0 part
├─sdb4   8:20   0    4G  0 part /run/initramfs/cos-state
└─sdb5   8:21   0 73.6G  0 part /etc/pki/tls/certs
                                /var/lib/wicked
                                /var/lib/snapd
                                /var/lib/rancher
                                /var/lib/longhorn
                                /var/lib/kubelet
                                /var/lib/extensions
                                /var/lib/dbus
                                /var/lib/containerd
                                /var/lib/cni
                                /var/lib/ca-certificates
                                /etc/zfs
                                /etc/systemd
                                /etc/sysconfig
                                /etc/ssh
                                /var/snap
                                /etc/runlevels
                                /etc/rancher
                                /etc/modprobe.d
                                /var/log
                                /usr/libexec
                                /etc/kubernetes
                                /run/k3s
                                /etc/iscsi
                                /etc/init.d
                                /etc/cni
                                /root
                                /opt
                                /home
                                /usr/local
sdc      8:32   0  100G  0 disk /usr/local/.state/var-lib-rancher.bind/rke2
                                /var/lib/rancher/rke2

To be clear, in the above sdb is disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0, i.e. disk 0, sdc is disk 1 and sda is disk 2 so everything installed correctly but the sdX devices are just wrong as per the original issue statement. Regardless, this works. It boots and stuff is on the correct disk.

When I leave out device or do device: auto it "install loops". If I turn off auto shutdown and ssh into the node while the installer is still up and run these troubleshooting commands:

echo "Disk 0, should be Kairos stuff"
lsblk -f "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
blkid "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
echo ""

echo "Disk 1, should be /var/lib/rancher/rke2"
lsblk -f "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0"
blkid "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0"
echo ""

echo "Disk 2, should be  /run/k3s"
lsblk -f "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0"
blkid "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0"

I get this output:

Disk 0, should be Kairos stuff
NAME   FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sda
└─sda1
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0: PTUUID="196e314b-07a0-45ee-b82b-419582391e6e" PTTYPE="gpt"

Disk 1, should be /var/lib/rancher/rke2
NAME FSTYPE FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sdb  ext4   1.0   RKE2  2ee2eadc-7cb7-4231-a2c8-e79ca4ab61a7
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0: LABEL="RKE2" UUID="2ee2eadc-7cb7-4231-a2c8-e79ca4ab61a7" TYPE="ext4"

Disk 2, should be  /run/k3s
NAME   FSTYPE FSVER LABEL          UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sdc
├─sdc1
├─sdc2 ext4   1.0   COS_OEM        43ca157c-cecb-4c6c-9340-ab2d1d61b765
├─sdc3 ext4   1.0   COS_RECOVERY   6f983e16-8658-4a44-a754-1cfa7883b3f0
├─sdc4 ext4   1.0   COS_STATE      78ce8c4d-92d5-4210-be27-13e20b3ec07f
└─sdc5 ext4   1.0   COS_PERSISTENT d32ad1ff-9738-4b34-ac3d-6f62110e6800
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0: PTUUID="a95f1345-adcc-4656-ae7d-17bfb3e08f5b" PTTYPE="gpt"

It seems to have installed to Disk 2? Or am I reading / interpreting this wrong? Again, when using device: /dev/sda it works and the output of the above commands is correct. So it makes sense it's installing looping, it is installing to the wrong disk, ignoring the commands in the clout_init.yaml?

I"m confused. Why would setting device: /dev/sda do that?

I tried getting the logs you asked for. I can get something when I don't have it shut down and I ssh in to the node:

[root@lpul-vault-k8s-server-0 kairos]# journalctl -u kairos-agent
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: Started kairos agent.
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: warning: skipping /oem/userdata (extension).
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF Kairos Agent version=v2.8.11
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF Kairos System version=v3.0.6
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF creating a runtime
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF detecting boot state
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF Boot Mode boot_mode=livecd_boot
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF Boot in uki mode result=false
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: warning: skipping /oem/userdata (extension).
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF Kairos Agent version=v2.8.11
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF Kairos System version=v3.0.6
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF creating a runtime
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF detecting boot state
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF Boot Mode boot_mode=livecd_boot
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF Boot in uki mode result=false
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: kairos-agent.service: Deactivated successfully.
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: Started kairos agent.
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1458]: warning: skipping /oem/userdata (extension).
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1458]: 2024-04-23T18:34:31Z INF Kairos Agent version=v2.8.11
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1458]: 2024-04-23T18:34:31Z INF Kairos System version=v3.0.6
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1458]: 2024-04-23T18:34:31Z INF creating a runtime
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1458]: 2024-04-23T18:34:31Z INF detecting boot state
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1458]: 2024-04-23T18:34:31Z INF Boot Mode boot_mode=livecd_boot
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1458]: 2024-04-23T18:34:31Z INF Boot in uki mode result=false
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: kairos-agent.service: Deactivated successfully.

But that doesn't look that useful to me. I had tried shutting down the node and adding the disk to another running VM then mounting it but disk 0 wouldn't mount. Once I realized it wasn't installing to disk 0 I then did the same but with disk 2 and that worked. I mounted /dev/sdb5 which had the /var/log directory and looked at .state/var-log.bind/journal using journalctl -D journal and found there were no kairos-agent logs there . . . ? Lots of other logs. I'll post those logs in the next comment to keep this one more readable.

A few notes to make sure you are aware of the whole setup:

  1. We are installing form AuroraBoot.
  2. Installing to a VSphere VM
  3. Automation via Terraform (Well, Open Tofu now but same diff)
  4. For this testing I've deployed the AuroraBoot node and first cluster node with TF then tweaked those by hand to run the different tests. I.e. SSH into AuroraBoot node, stop the container, change the cloud_init.yaml and restart the container, then stop the cluster node, replace the disks if needed and restart the node. This is just to speed up testing and I get the same results if automation is used.

Another peace of info that may or may not be useful. This is the AuroraBoot run statement. This is part of a shell file that is ran as a service on the node and I haven't touched it in a while.

  docker run --rm --net host \
    -v "/usr/local/auroraboot-build:/tmp/auroraboot" \
    -v "/etc/systemd/system/cloud_init.yaml:/cloud_init.yaml" \
    -v /var/run/docker.sock:/var/run/docker.sock \
    "quay.artifactory.metro.ad.selinc.com/kairos/auroraboot:${AURORABOOT_VERSION}" \
    --set "container_image=$container_image" \
    --cloud-config /cloud_init.yaml \
    --set "state_dir=/tmp/auroraboot" \
    --set netboot.cmdline="rd.neednet=1 ip=dhcp rd.cos.disable netboot nodepair.enable console=tty0 selinux=0" \
    --debug \

I'm curios about the --set netboot.cmdline arg. Is that still necessary. I"m not seeing it recommended in the docs now and I'm not sure if it's still needed or even what it does. I tried removing it but things didn't seem to change.

Hope this helps.

sarg3nt commented 6 months ago

See attached log file. kairos_logs.txt

jimmykarily commented 6 months ago

When you are setting device: /dev/sda you are essentially skipping the target detection. Given /dev/sdX disks can change, I suspect the only reason it works for you is because it so happens that /dev/sda is the right disk at installation time (and obviously changes after reboot). In the config you shared above, with device: /dev/sda set and manual partitioning happening in the kairos-install.pre.before, either disk is partitioned twice or one of the two is skipped, I'm not sure, the installation logs would help here.

It's possible that when you don't set the device explicitly, for some reason the detection doesn't work and the target is left empty. But that would need installSpec.NoFormat to not be set correctly too, otherwise nothing would set the Target and the installation would fail.

@sarg3nt the logs you shared are not the installation logs. I'm not sure if those are available after rebooting to the system. You can get the installation logs by:

Maybe there are other ways to get the installation logs in the auroraboot flow but I can't think one right now. Maybe if you set reboot: false in the config you get an opportunity to ssh to the box while still in livecd mode. The installation logs should be still around in that case.

One of the 2 options should allow you to collect installation logs and that will reveal more on what actually happens.

Thanks for your patience in fixing this Dave, let's hope we get it sorted out soon!

jimmykarily commented 6 months ago

The logs you attached show immucore v0.1.6:

Apr 23 17:32:49 localhost immucore[589]: 2024-04-23T17:32:49Z INF Immucore commit=none compiled with=go1.20.2 version=v0.1.6

the image you use should be v0.1.25:

$ docker run quay.io/kairos/rockylinux:9-core-amd64-generic-master immucore version   
2024-04-24T07:53:48Z INF Immucore commit=none compiled with=go1.21.7 version=v0.1.25

Something is off...

thanks @Itxaka for spotting this

sarg3nt commented 6 months ago

@jimmykarily I don't know where it got version=v0.1.6 from. I got those logs in a very roundabout way and was a little sus when they were localhost In any case, my built image was reporting version=v0.1.25.

I rebuilt the client OS image from the latest master and our AuroraBoot image even though we were already running AuroraBoot v0.2.7 and I can now leave the device: line out and everything works. Not sure what changed though.

Your info about booting into live-cd mode and running kairos-agent manual-install was a great tip.
I tried it with the node net booting from the AuroraBoot image but with the cloud-config the AuroraBoot node was serving set to auto: false, reboot: false and poweroff: false and tried to use that as the starting point. I know that file is being served to the downstream node but don't know where it ends up so I followed your instructions and saved it to /tmp/config.yaml and ran the manual-install as you showed. It looked like some things were running twice so I think it's using the file I gave it and the one it was served from AuoraBoot maybe?
Q: Is there a way to run kairos-agent install and have it use the one it was served from net-boot?

The bigger surprise is that cloud_init.yaml being served from the target nodes VSphere guestinfo.userdata is being ran even when auto: false is set. I assumed it would not, but it is.

To explain in more detail. The target node gets our custom config from two sources. The first is the basic config that is the same for all cluster nodes and is served from AuroraBoot. This config has the install: section in it. The second is the config that is different for each node, that is injected form Terraform into VSphere via guestinfo.userdata. The VSphere one is ran but the first is not.
Q: Is this expected?

I'll include redacted copies of each below, to help clarify what is happening.

From AuroraBoot:

#cloud-config

strict: true
debug: true
install:
  no-format: true
  auto: false
  poweroff: false
  reboot: false
  grub_options:
    extra_cmdline: "rd.immucore.debug"
  bind_mounts:
  - /run/k3s
users:
  - name: "kairos-auroraboot"
    passwd: "<redacted>"
    ssh_authorized_keys:
      - <redacted>
write_files:
  - encoding: b64
    content: <redacted>
    path: <redacted>
    permissions: "0444"
runcmd:
  - Some run commands here.
stages:
  kairos-install.pre.before:
  - if:  '[ -e "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" ]'
    name: "Create partitions"
    commands:
      - |
        parted --script --machine -- "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" mklabel gpt
        # Legacy bios
        sgdisk --new=1:2048:+1M --change-name=1:'bios' --typecode=1:EF02 "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
    layout:
      device:
        path: "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
      expand_partition:
        size: 0 # All available space
      add_partitions:
        - fsLabel: COS_OEM
          size: 64
          pLabel: oem
        - fsLabel: COS_RECOVERY
          size: 8500
          pLabel: recovery
        - fsLabel: COS_STATE
          size: 18000
          pLabel: state
        - fsLabel: COS_PERSISTENT
          pLabel: persistent
          size: 0
          filesystem: "ext4"
  boot:
    - systemd_firstboot:
      keymap: us
    - name: "Environment Variables"
      environment:
        HTTP_PROXY: "<redacted>"
        <snip>
    - name: "Setup services"
      systemctl:
        disable:
          - dnf-makecache
    - name: "Setup NTP"
      systemctl:
        enable:
          - systemd-timesyncd
      timesyncd:
        NTP: "<redacted>"
        FallbackNTP: ""
  after-install-chroot:
    - name: "Create data directories"
      commands:
        - make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0" "/var/lib/rancher/rke2" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
        - make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0" "/run/k3s" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
        - make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0" "/var/lib/rancher/longhorn" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
    - name: "Format disks"
      commands:
        - make_disk.sh "format_disk" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0" "/var/lib/rancher/rke2" "RKE2" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
        - make_disk.sh "format_disk" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0" "/run/k3s" "K3S" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
        - make_disk.sh "format_disk" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0" "/var/lib/rancher/longhorn" "LONGHORN" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
  after-reset-chroot:
    - name: "Create data directories"
      commands:
        - make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0" "/var/lib/rancher/rke2" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
        - make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0" "/run/k3s" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
        - make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0" "/var/lib/rancher/longhorn" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
  after-upgrade-chroot:
    - name: "Create data directories"
      commands:
        - make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0" "/var/lib/rancher/rke2" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
        - make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0" "/run/k3s" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
        - make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0" "/var/lib/rancher/longhorn" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
  initramfs:
    - name: "Mount disks"
      commands:
        # Making the /run/k3s directory here as well as it fixes the directory going missing bug 
        - make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0" "/run/k3s" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
        - make_disk.sh "mount_disk" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0" "/var/lib/rancher/rke2" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
        - make_disk.sh "mount_disk" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0" "/run/k3s" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
        - make_disk.sh "mount_disk" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0" "/var/lib/rancher/longhorn" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi

The file that is injected via VSphere and is running on startup: This ends up in /oem/userdata.yaml and /oem/userdata

#cloud-config

users:
  - name: "kairos"
    passwd: "<redacted>"
    ssh_authorized_keys:
      - ssh-rsa <redacted>
write_files:
# These files exist after startup.
  - encoding: b64
    content: '<redacted>'
    path: /etc/rancher/rke2/config.yaml
    permissions: "0644"
    owner: "root"
  - encoding: b64
    content: '<redacted>'
    path: /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
    permissions: "0644"
    owner: "root"
stages:
  initramfs:
    - name: "Set hostname"
      hostname: "lpul-vault-k8s-server-0.vault.ad.selinc.com"
    - name: "Run commands"
      commands:
        - bash /usr/bin/initramfs_scripts.sh 2>&1 | tee -a /var/log/sel/initramfs_scripts.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
  boot:
    - name: "Setup services"
      systemctl:
        enable:
          - rke2-server.timer
          - vmtoolsd.timer
          - qualys-cloud-agent.timer
          - falcon-sensor.timer
        start:
          - rke2-server.timer
          - vmtoolsd.timer
          - qualys-cloud-agent.timer
          - falcon-sensor.timer

And here's the log after running kairos-agent manual-install /tmp/config.yaml 2>&1 | tee out.log with the first file above.

[root@lpul-vault-k8s-server-0 tmp]# kairos-agent manual-install /tmp/config.yaml 2>&1 | tee out.log
2024-04-24T20:21:09Z INF Kairos Agent version=v2.9.1
2024-04-24T20:21:09Z DBG Kairos Agent version={"git_commit":"none","go_version":"go1.21.7","version":"v2.9.1"}
2024-04-24T20:21:09Z INF Kairos System version=v3.0.4-43-g595a9d5
2024-04-24T20:21:09Z INF creating a runtime
2024-04-24T20:21:09Z INF detecting boot state
2024-04-24T20:21:09Z INF Boot Mode boot_mode=livecd_boot
2024-04-24T20:21:09Z INF Boot in uki mode result=false
2024-04-24T20:21:09Z DBG Loaded config: &config.Config{
  Install: &config.Install{
    Auto: false,
    Reboot: false,
    NoFormat: true,
    Device: "",
    Poweroff: false,
    GrubOptions: map[string]string{
      "extra_cmdline": "rd.immucore.debug",
    },
    Bundles: nil,
    Encrypt: nil,
    SkipEncryptCopyPlugins: false,
    Env: nil,
    Source: "",
    EphemeralMounts: nil,
    BindMounts: []string{
      "/run/k3s",
    },
  },
  Config: collector.Config{
    "config_url": "http://10.105.148.76:8090/_/file?name=other-1",
    "debug": true,
    "install": collector.Config{
      "auto": false,
      "bind_mounts": []interface {}{
        "/run/k3s",
      },
      "grub_options": collector.Config{
        "extra_cmdline": "rd.immucore.debug",
      },
      "no-format": true,
      "poweroff": false,
      "reboot": false,
    },
    "runcmd": []interface {}{
      "ln -s /opt/qualys/ /usr/local/qualys",
      "/opt/qualys/cloud-agent/bin/qualys-cloud-agent.sh ActivationId=1751b9b6-ccde-462d-aafa-cfd03d71acd3 CustomerId=ef6a4b08-1375-70aa-81bb-7bfa031eec64",
      "/opt/CrowdStrike/falconctl -s -f --aph=wall.ad.selinc.com --app=8080 --apd=false --cid=FC92F4C7EADF4A30B3AE88AD6FD371B7-74",
    },
    "stages": collector.Config{
      "after-install-chroot": []interface {}{
        collector.Config{
          "commands": []interface {}{
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0\" \"/var/lib/rancher/rke2\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0\" \"/run/k3s\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0\" \"/var/lib/rancher/longhorn\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
          },
          "name": "Create data directories",
        },
        collector.Config{
          "commands": []interface {}{
            "make_disk.sh \"format_disk\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0\" \"/var/lib/rancher/rke2\" \"RKE2\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"format_disk\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0\" \"/run/k3s\" \"K3S\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"format_disk\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0\" \"/var/lib/rancher/longhorn\" \"LONGHORN\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
          },
          "name": "Format disks",
        },
        collector.Config{
          "commands": []interface {}{
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0\" \"/var/lib/rancher/rke2\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0\" \"/run/k3s\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0\" \"/var/lib/rancher/longhorn\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
          },
          "name": "Create data directories",
        },
        collector.Config{
          "commands": []interface {}{
            "make_disk.sh \"format_disk\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0\" \"/var/lib/rancher/rke2\" \"RKE2\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"format_disk\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0\" \"/run/k3s\" \"K3S\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"format_disk\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0\" \"/var/lib/rancher/longhorn\" \"LONGHORN\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
          },
          "name": "Format disks",
        },
      },
      "after-reset-chroot": []interface {}{
        collector.Config{
          "commands": []interface {}{
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0\" \"/var/lib/rancher/rke2\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0\" \"/run/k3s\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0\" \"/var/lib/rancher/longhorn\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
          },
          "name": "Create data directories",
        },
        collector.Config{
          "commands": []interface {}{
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0\" \"/var/lib/rancher/rke2\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0\" \"/run/k3s\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0\" \"/var/lib/rancher/longhorn\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
          },
          "name": "Create data directories",
        },
      },
      "after-upgrade-chroot": []interface {}{
        collector.Config{
          "commands": []interface {}{
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0\" \"/var/lib/rancher/rke2\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0\" \"/run/k3s\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0\" \"/var/lib/rancher/longhorn\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
          },
          "name": "Create data directories",
        },
        collector.Config{
          "commands": []interface {}{
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0\" \"/var/lib/rancher/rke2\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0\" \"/run/k3s\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0\" \"/var/lib/rancher/longhorn\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
          },
          "name": "Create data directories",
        },
      },
      "boot": []interface {}{
        collector.Config{
          "keymap": "us",
          "systemd_firstboot": nil,
        },
        collector.Config{
          "environment": collector.Config{
            "HTTPS_PROXY": "http://wall.ad.selinc.com:8080",
            "HTTP_PROXY": "http://wall.ad.selinc.com:8080",
            "NO_PROXY": "localhost,localaddress,svc.cluster.local,host.docker.internal,kubernetes.docker.internal,.svc.cluster.local,cluster.local,.cluster.local,default.svc,docker.sel.inc,sel.inc,.sel.inc,ad.selinc.com,.ad.selinc.com,metro.ad.selinc.com,.metro.ad.selinc.com,bitbucket.metro.ad.selinc.com,artifactory.metro.ad.selinc.com,*.ad.selinc.com,10.43.0.1,127.0.0.1,127.0.0.0,0.0.0.0,127.0.0.0/8,10.0.0.0/8,10.*.*.*,10.*,172.16.0.0/12,192.168.0.0/16,169.254.169.254",
            "http_proxy": "http://wall.ad.selinc.com:8080",
            "https_proxy": "http://wall.ad.selinc.com:8080",
            "no_proxy": "localhost,localaddress,svc.cluster.local,host.docker.internal,kubernetes.docker.internal,.svc.cluster.local,cluster.local,.cluster.local,default.svc,docker.sel.inc,sel.inc,.sel.inc,ad.selinc.com,.ad.selinc.com,metro.ad.selinc.com,.metro.ad.selinc.com,bitbucket.metro.ad.selinc.com,artifactory.metro.ad.selinc.com,*.ad.selinc.com,10.43.0.1,127.0.0.1,127.0.0.0,0.0.0.0,127.0.0.0/8,10.0.0.0/8,10.*.*.*,10.*,172.16.0.0/12,192.168.0.0/16,169.254.169.254",
          },
          "name": "Environment Variables",
        },
        collector.Config{
          "name": "Setup services",
          "systemctl": collector.Config{
            "disable": []interface {}{
              "dnf-makecache",
            },
          },
        },
        collector.Config{
          "name": "Setup NTP",
          "systemctl": collector.Config{
            "enable": []interface {}{
              "systemd-timesyncd",
            },
          },
          "timesyncd": collector.Config{
            "FallbackNTP": "",
            "NTP": "ntp.ad.selinc.com ntp2.ad.selinc.com ntp3.ad.selinc.com",
          },
        },
        collector.Config{
          "keymap": "us",
          "systemd_firstboot": nil,
        },
        collector.Config{
          "environment": collector.Config{
            "HTTPS_PROXY": "http://wall.ad.selinc.com:8080",
            "HTTP_PROXY": "http://wall.ad.selinc.com:8080",
            "NO_PROXY": "localhost,localaddress,svc.cluster.local,host.docker.internal,kubernetes.docker.internal,.svc.cluster.local,cluster.local,.cluster.local,default.svc,docker.sel.inc,sel.inc,.sel.inc,ad.selinc.com,.ad.selinc.com,metro.ad.selinc.com,.metro.ad.selinc.com,bitbucket.metro.ad.selinc.com,artifactory.metro.ad.selinc.com,*.ad.selinc.com,10.43.0.1,127.0.0.1,127.0.0.0,0.0.0.0,127.0.0.0/8,10.0.0.0/8,10.*.*.*,10.*,172.16.0.0/12,192.168.0.0/16,169.254.169.254",
            "http_proxy": "http://wall.ad.selinc.com:8080",
            "https_proxy": "http://wall.ad.selinc.com:8080",
            "no_proxy": "localhost,localaddress,svc.cluster.local,host.docker.internal,kubernetes.docker.internal,.svc.cluster.local,cluster.local,.cluster.local,default.svc,docker.sel.inc,sel.inc,.sel.inc,ad.selinc.com,.ad.selinc.com,metro.ad.selinc.com,.metro.ad.selinc.com,bitbucket.metro.ad.selinc.com,artifactory.metro.ad.selinc.com,*.ad.selinc.com,10.43.0.1,127.0.0.1,127.0.0.0,0.0.0.0,127.0.0.0/8,10.0.0.0/8,10.*.*.*,10.*,172.16.0.0/12,192.168.0.0/16,169.254.169.254",
          },
          "name": "Environment Variables",
        },
        collector.Config{
          "name": "Setup services",
          "systemctl": collector.Config{
            "disable": []interface {}{
              "dnf-makecache",
            },
          },
        },
        collector.Config{
          "name": "Setup NTP",
          "systemctl": collector.Config{
            "enable": []interface {}{
              "systemd-timesyncd",
            },
          },
          "timesyncd": collector.Config{
            "FallbackNTP": "",
            "NTP": "ntp.ad.selinc.com ntp2.ad.selinc.com ntp3.ad.selinc.com",
          },
        },
      },
      "initramfs": []interface {}{
        collector.Config{
          "commands": []interface {}{
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0\" \"/run/k3s\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"mount_disk\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0\" \"/var/lib/rancher/rke2\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"mount_disk\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0\" \"/run/k3s\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"mount_disk\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0\" \"/var/lib/rancher/longhorn\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
          },
          "name": "Mount disks",
        },
        collector.Config{
          "commands": []interface {}{
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0\" \"/run/k3s\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"mount_disk\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0\" \"/var/lib/rancher/rke2\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"mount_disk\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0\" \"/run/k3s\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"mount_disk\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0\" \"/var/lib/rancher/longhorn\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
          },
          "name": "Mount disks",
        },
      },
      "kairos-install.pre.before": []interface {}{
        collector.Config{
          "commands": []interface {}{
            "parted --script --machine -- \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0\" mklabel gpt\n# Legacy bios\nsgdisk --new=1:2048:+1M --change-name=1:'bios' --typecode=1:EF02 \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0\"\n",
          },
          "if": "[ -e \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0\" ]",
          "layout": collector.Config{
            "add_partitions": []interface {}{
              collector.Config{
                "fsLabel": "COS_OEM",
                "pLabel": "oem",
                "size": 64,
              },
              collector.Config{
                "fsLabel": "COS_RECOVERY",
                "pLabel": "recovery",
                "size": 8500,
              },
              collector.Config{
                "fsLabel": "COS_STATE",
                "pLabel": "state",
                "size": 18000,
              },
              collector.Config{
                "filesystem": "ext4",
                "fsLabel": "COS_PERSISTENT",
                "pLabel": "persistent",
                "size": 0,
              },
            },
            "device": collector.Config{
              "path": "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0",
            },
            "expand_partition": collector.Config{
              "size": 0,
            },
          },
          "name": "Create partitions",
        },
        collector.Config{
          "commands": []interface {}{
            "parted --script --machine -- \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0\" mklabel gpt\n# Legacy bios\nsgdisk --new=1:2048:+1M --change-name=1:'bios' --typecode=1:EF02 \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0\"\n",
          },
          "if": "[ -e \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0\" ]",
          "layout": collector.Config{
            "add_partitions": []interface {}{
              collector.Config{
                "fsLabel": "COS_OEM",
                "pLabel": "oem",
                "size": 64,
              },
              collector.Config{
                "fsLabel": "COS_RECOVERY",
                "pLabel": "recovery",
                "size": 8500,
              },
              collector.Config{
                "fsLabel": "COS_STATE",
                "pLabel": "state",
                "size": 18000,
              },
              collector.Config{
                "filesystem": "ext4",
                "fsLabel": "COS_PERSISTENT",
                "pLabel": "persistent",
                "size": 0,
              },
            },
            "device": collector.Config{
              "path": "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0",
            },
            "expand_partition": collector.Config{
              "size": 0,
            },
          },
          "name": "Create partitions",
        },
      },
    },
    "strict": true,
    "users": []interface {}{
      collector.Config{
        "name": "kairos-auroraboot",
        "passwd": "kairos",
        "ssh_authorized_keys": []interface {}{
          "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCw/XgWQOq5Nx46cl2phALYdoJRoINuqD+cT9arVc6XMx4gl0KO7c98Po/Y/rPcTtnrqxaSOCaOSVB2slnEovKAEnXwchH1Ndub937MtSxDyhc5eiwoEj2nYgJ0QrTfQdFBim0ysvWxpJpLGYyR32idhI67vtcq3LDjqW1lFoIcx/X1/L7qn5/b81N+tg6vwE2Li0+fxFlMbTxuFwSdBLzGI51wqDCnWBb6N2IXfHzSv8o4l52fZ0UtwC0TT1ACmh7T+bP/cZ/Dxno4iOdLX9WbqEZC3lKeXqvjzKDyrAwu2/m7e5Lhd+OHUgIjw2rLypHErSFADazcycxM0FvORVtprcaTvgBpK9bZqn8a40JrHYb9Z/0swn1HC0KhtYSBpl4/nRZkvb9iAFCA0QYdmVwRrQ8sb8TTQHYmGf+svdfvyCs+GHWG3h0blFMH66AucLMnUR5hulNGkd+6Y2dNsH9OpQspNYfH/9mV3PJFSICxPKFybC9vwV3MuKSRMdQ77dc= davesarg-sa@sargesavm",
        },
      },
      collector.Config{
        "name": "kairos-auroraboot",
        "passwd": "IXOxMlpGW4qmXaJSd0e4zs1xdqU91wPRVdFtmvVM0eNszYliaGr5hXlpCtis7oFf",
        "ssh_authorized_keys": []interface {}{
          "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCw/XgWQOq5Nx46cl2phALYdoJRoINuqD+cT9arVc6XMx4gl0KO7c98Po/Y/rPcTtnrqxaSOCaOSVB2slnEovKAEnXwchH1Ndub937MtSxDyhc5eiwoEj2nYgJ0QrTfQdFBim0ysvWxpJpLGYyR32idhI67vtcq3LDjqW1lFoIcx/X1/L7qn5/b81N+tg6vwE2Li0+fxFlMbTxuFwSdBLzGI51wqDCnWBb6N2IXfHzSv8o4l52fZ0UtwC0TT1ACmh7T+bP/cZ/Dxno4iOdLX9WbqEZC3lKeXqvjzKDyrAwu2/m7e5Lhd+OHUgIjw2rLypHErSFADazcycxM0FvORVtprcaTvgBpK9bZqn8a40JrHYb9Z/0swn1HC0KhtYSBpl4/nRZkvb9iAFCA0QYdmVwRrQ8sb8TTQHYmGf+svdfvyCs+GHWG3h0blFMH66AucLMnUR5hulNGkd+6Y2dNsH9OpQspNYfH/9mV3PJFSICxPKFybC9vwV3MuKSRMdQ77dc= davesarg-sa@sargesavm}",
        },
      },
    },
    "write_files": []interface {}{
      collector.Config{
        "content": "cXVhbHlzX2h0dHBzX3Byb3h5PWh0dHA6Ly93YWxsLmFkLnNlbGluYy5jb206ODA4MAo=",
        "encoding": "b64",
        "path": "/etc/sysconfig/qualys-cloud-agent",
        "permissions": "0444",
      },
      collector.Config{
        "content": "cXVhbHlzX2h0dHBzX3Byb3h5PWh0dHA6Ly93YWxsLmFkLnNlbGluYy5jb206ODA4MAo=",
        "encoding": "b64",
        "path": "/etc/sysconfig/qualys-cloud-agent",
        "permissions": "0444",
      },
    },
  },
  ConfigURL: "http://10.105.148.76:8090/_/file?name=other-1",
  Options: map[string]string(nil), // p0
  FailOnBundleErrors: false,
  Bundles: nil,
  GrubOptions: p0,
  Env: nil,
  Debug: true,
  Strict: true,
  CloudInitPaths: nil,
  EjectCD: false,
  Logger: types.KairosLogger{
    Logger: zerolog.Logger{},
  },
  Fs: &vfs.osfs{}, // p1
  Mounter: &mount.Mounter{},
  Runner: &v1.RealRunner{ // p2
    Logger: &types.KairosLogger{
      Logger: zerolog.Logger{},
    },
  },
  Syscall: &v1.RealSyscall{},
  CloudInitRunner: &cloudinit.YipCloudInitRunner{},
  ImageExtractor: v1.OCIImageExtractor{},
  Client: &http.Client{},
  Platform: &v1.Platform{
    OS: "linux",
    Arch: "x86_64",
    GolangArch: "amd64",
  },
  Cosign: false,
  Verify: false,
  CosignPubKey: "",
  Arch: "x86_64",
  SquashFsCompressionConfig: []string{},
  SquashFsNoCompression: true,
  UkiMaxEntries: 3,
}
2024-04-24T20:21:10Z INF Setting image size to 1063Mb
2024-04-24T20:21:10Z INF Setting OEM partition size to 64Mb
2024-04-24T20:21:10Z INF Setting recovery partition size to 2326Mb
2024-04-24T20:21:10Z INF Setting state partition size to 4189Mb
2024-04-24T20:21:10Z INF Setting persistent partition size to 0Mb
2024-04-24T20:21:10Z DBG Loaded install spec: &v1.InstallSpec{
  Target: "",
  Firmware: "bios",
  PartTable: "gpt",
  Partitions: v1.ElementalPartitions{
    BIOS: &v1.Partition{
      Name: "bios",
      FilesystemLabel: "",
      Size: 1,
      FS: "",
      Flags: []string{
        "bios_grub",
      },
      MountPoint: "",
      Path: "",
      Disk: "",
    },
    EFI: nil,
    OEM: &v1.Partition{
      Name: "oem",
      FilesystemLabel: "COS_OEM",
      Size: 64,
      FS: "ext4",
      Flags: []string{}, // p0
      MountPoint: "/run/cos/oem",
      Path: "",
      Disk: "",
    },
    Recovery: &v1.Partition{
      Name: "recovery",
      FilesystemLabel: "COS_RECOVERY",
      Size: 2326,
      FS: "ext4",
      Flags: p0,
      MountPoint: "/run/cos/recovery",
      Path: "",
      Disk: "",
    },
    State: &v1.Partition{
      Name: "state",
      FilesystemLabel: "COS_STATE",
      Size: 4189,
      FS: "ext4",
      Flags: p0,
      MountPoint: "/run/cos/state",
      Path: "",
      Disk: "",
    },
    Persistent: &v1.Partition{
      Name: "persistent",
      FilesystemLabel: "COS_PERSISTENT",
      Size: 0,
      FS: "ext4",
      Flags: p0,
      MountPoint: "/run/cos/persistent",
      Path: "",
      Disk: "",
    },
  },
  ExtraPartitions: nil,
  NoFormat: true,
  Force: false,
  CloudInit: nil,
  Iso: "",
  GrubDefEntry: "",
  Tty: "tty1",
  Reboot: false,
  PowerOff: false,
  ExtraDirsRootfs: nil,
  Active: v1.Image{
    File: "/run/cos/state/cOS/active.img",
    Label: "COS_ACTIVE",
    Size: 1063,
    FS: "ext2",
    Source: &v1.ImageSource{},
    MountPoint: "/run/cos/active",
    LoopDevice: "",
  },
  Recovery: v1.Image{
    File: "/run/cos/recovery/cOS/recovery.img",
    Label: "COS_SYSTEM",
    Size: 1063,
    FS: "ext2",
    Source: &v1.ImageSource{},
    MountPoint: "",
    LoopDevice: "",
  },
  Passive: v1.Image{
    File: "/run/cos/state/cOS/passive.img",
    Label: "COS_PASSIVE",
    Size: 1063,
    FS: "ext2",
    Source: &v1.ImageSource{},
    MountPoint: "",
    LoopDevice: "",
  },
  GrubConf: "/etc/cos/grub.cfg",
}
2024-04-24T20:21:10Z DBG Cloud-init paths set to [/system/oem /oem/ /usr/local/cloud-config/ /tmp/kairos-install-config-xxx.yaml223218110]
2024-04-24T20:21:10Z DBG Failed creating cloud-init config path: /tmp/kairos-install-config-xxx.yaml223218110 mkdir /tmp/kairos-install-config-xxx.yaml223218110: not a directory
2024-04-24T20:21:10Z INF Running stage: kairos-install.pre.before

2024-04-24T20:21:10Z INF Processing stage step 'Create partitions'. ( commands: 1, files: 0, ... )
2024-04-24T20:21:11Z INF Command output: Setting name!
partNum is 0
The operation has completed successfully.

2024-04-24T20:21:11Z INF Creating COS_OEM partition
2024-04-24T20:21:12Z INF Creating COS_RECOVERY partition
2024-04-24T20:21:14Z INF Creating COS_STATE partition
2024-04-24T20:21:15Z INF Creating COS_PERSISTENT partition
2024-04-24T20:21:16Z INF Extending last partition to max space
2024-04-24T20:21:17Z ERR Failed growing partition: NOCHANGE: partition 5 is size 113364959. it cannot be grown

failed to run growpart /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0 5: exit status 1
2024-04-24T20:21:17Z ERR NOCHANGE: partition 5 is size 113364959. it cannot be grown

2024-04-24T20:21:17Z ERR failed to run growpart /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0 5: exit status 1
2024-04-24T20:21:17Z INF Processing stage step 'Create partitions'. ( commands: 1, files: 0, ... )
2024-04-24T20:21:18Z INF Command output: Setting name!
partNum is 0
The operation has completed successfully.

2024-04-24T20:21:18Z INF Creating COS_OEM partition
2024-04-24T20:21:19Z INF Creating COS_RECOVERY partition
2024-04-24T20:21:21Z INF Creating COS_STATE partition
2024-04-24T20:21:22Z INF Creating COS_PERSISTENT partition
2024-04-24T20:21:24Z INF Extending last partition to max space
2024-04-24T20:21:24Z ERR Failed growing partition: NOCHANGE: partition 5 is size 113364959. it cannot be grown

failed to run growpart /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0 5: exit status 1
2024-04-24T20:21:24Z ERR NOCHANGE: partition 5 is size 113364959. it cannot be grown

2024-04-24T20:21:24Z ERR failed to run growpart /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0 5: exit status 1
2024-04-24T20:21:24Z INF Done executing stage 'kairos-install.pre.before'

2024-04-24T20:21:24Z INF Running stage: kairos-install.pre

2024-04-24T20:21:24Z INF Done executing stage 'kairos-install.pre'

2024-04-24T20:21:24Z INF Running stage: kairos-install.pre.after

2024-04-24T20:21:24Z INF Done executing stage 'kairos-install.pre.after'

2024-04-24T20:21:24Z INF Running stage: kairos-install.pre.before

2024-04-24T20:21:24Z INF Done executing stage 'kairos-install.pre.before'

2024-04-24T20:21:24Z INF Running stage: kairos-install.pre

2024-04-24T20:21:24Z INF Done executing stage 'kairos-install.pre'

2024-04-24T20:21:24Z INF Running stage: kairos-install.pre.after

2024-04-24T20:21:24Z INF Done executing stage 'kairos-install.pre.after'

2024-04-24T20:21:24Z INF NoFormat is true, skipping format and partitioning
2024-04-24T20:21:24Z INF Checking for active deployment
2024-04-24T20:21:24Z DBG Running cmd: 'udevadm settle'
2024-04-24T20:21:25Z DBG Running cmd: 'udevadm settle'
2024-04-24T20:21:26Z INF No target device specified, using pre-configured device: /dev/sda
2024-04-24T20:21:26Z INF Mounting disk partitions
2024-04-24T20:21:26Z DBG Mounting partition COS_OEM
2024-04-24T20:21:26Z DBG Running cmd: 'udevadm settle'
2024-04-24T20:21:26Z DBG Mounting partition COS_PERSISTENT
2024-04-24T20:21:26Z DBG Running cmd: 'udevadm settle'
2024-04-24T20:21:26Z DBG Mounting partition COS_RECOVERY
2024-04-24T20:21:26Z DBG Running cmd: 'udevadm settle'
2024-04-24T20:21:26Z DBG Mounting partition COS_STATE
2024-04-24T20:21:26Z DBG Running cmd: 'udevadm settle'
2024-04-24T20:21:26Z INF Running before-install hook
2024-04-24T20:21:26Z DBG Cloud-init paths set to [/system/oem /oem/ /usr/local/cloud-config/ /tmp/kairos-install-config-xxx.yaml223218110]
2024-04-24T20:21:26Z DBG Failed creating cloud-init config path: /tmp/kairos-install-config-xxx.yaml223218110 mkdir /tmp/kairos-install-config-xxx.yaml223218110: not a directory
2024-04-24T20:21:26Z INF Running stage: before-install.before

2024-04-24T20:21:26Z INF Done executing stage 'before-install.before'

2024-04-24T20:21:26Z INF Running stage: before-install

2024-04-24T20:21:26Z INF Done executing stage 'before-install'

2024-04-24T20:21:26Z INF Running stage: before-install.after

2024-04-24T20:21:26Z INF Done executing stage 'before-install.after'

2024-04-24T20:21:26Z INF Running stage: before-install.before

2024-04-24T20:21:26Z INF Done executing stage 'before-install.before'

2024-04-24T20:21:26Z INF Running stage: before-install

2024-04-24T20:21:26Z INF Done executing stage 'before-install'

2024-04-24T20:21:26Z INF Running stage: before-install.after

2024-04-24T20:21:26Z INF Done executing stage 'before-install.after'

2024-04-24T20:21:26Z INF Creating file system image /run/cos/state/cOS/active.img with size 1063Mb
2024-04-24T20:21:26Z DBG Running cmd: 'mkfs.ext2 -L COS_ACTIVE /run/cos/state/cOS/active.img'
2024-04-24T20:21:26Z DBG Mounting image COS_ACTIVE
2024-04-24T20:21:26Z DBG Running cmd: 'losetup --show -f /run/cos/state/cOS/active.img'
2024-04-24T20:21:26Z INF Copying /run/rootfsbase source to /run/cos/active
2024-04-24T20:21:26Z INF Starting rsync...
2024-04-24T20:21:26Z DBG Running cmd: 'rsync --progress --partial --human-readable --archive --xattrs --acls --exclude=/mnt --exclude=/proc --exclude=/sys --exclude=/dev --exclude=/tmp --exclude=/host --exclude=/run /run/rootfsbase/ /run/cos/active/'
2024-04-24T20:21:31Z DBG Syncing data...
2024-04-24T20:21:31Z INF Finished syncing
2024-04-24T20:21:31Z INF Finished copying /run/rootfsbase into /run/cos/active
2024-04-24T20:21:31Z INF List of cloud inits to copy: [/tmp/kairos-install-config-xxx.yaml223218110]

2024-04-24T20:21:31Z INF Starting copying cloud config file /tmp/kairos-install-config-xxx.yaml223218110 to /run/cos/oem/90_custom.yaml
2024-04-24T20:21:31Z INF Finished copying cloud config file /tmp/kairos-install-config-xxx.yaml223218110 to /run/cos/oem/90_custom.yaml
2024-04-24T20:21:31Z INF Installing GRUB..
2024-04-24T20:21:31Z DBG Running grub with the following args: [--root-directory=/run/cos/active --boot-directory=/run/cos/state --target=i386-pc /dev/sda]
2024-04-24T20:21:31Z DBG Running cmd: '/usr/sbin/grub2-install --root-directory=/run/cos/active --boot-directory=/run/cos/state --target=i386-pc /dev/sda'
2024-04-24T20:21:32Z INF Grub install to device /dev/sda complete
2024-04-24T20:21:32Z INF Using grub config dir /run/cos/active/etc/cos/grub.cfg
2024-04-24T20:21:32Z INF Copying grub contents from /run/cos/active/etc/cos/grub.cfg to /run/cos/state/grub2/grub.cfg
2024-04-24T20:21:32Z DBG Extra mounts: map[/run/cos/oem:/oem /run/cos/persistent:/usr/local]
2024-04-24T20:21:32Z DBG Mounting /dev to chroot
2024-04-24T20:21:32Z DBG Mounted /dev to /run/cos/active/dev
2024-04-24T20:21:32Z DBG Mounting /dev/pts to chroot
2024-04-24T20:21:32Z DBG Mounted /dev/pts to /run/cos/active/dev/pts
2024-04-24T20:21:32Z DBG Mounting /proc to chroot
2024-04-24T20:21:32Z DBG Mounted /proc to /run/cos/active/proc
2024-04-24T20:21:32Z DBG Mounting /sys to chroot
2024-04-24T20:21:32Z DBG Mounted /sys to /run/cos/active/sys
2024-04-24T20:21:32Z DBG Mounting /run/cos/oem to chroot
2024-04-24T20:21:32Z DBG Mounted /run/cos/oem to /run/cos/active/oem
2024-04-24T20:21:32Z DBG Mounting /run/cos/persistent to chroot
2024-04-24T20:21:32Z DBG Mounted /run/cos/persistent to /run/cos/active/usr/local
2024-04-24T20:21:32Z DBG Running cmd: 'setfiles -c /etc/selinux/targeted/policy/policy.33 -e /dev -e /proc -e /sys -F /etc/selinux/targeted/contexts/files/file_contexts /'
2024-04-24T20:21:36Z DBG SELinux setfiles output:
2024-04-24T20:21:36Z DBG Unmounting /run/cos/active/usr/local from chroot
2024-04-24T20:21:36Z DBG Unmounting /run/cos/active/oem from chroot
2024-04-24T20:21:36Z DBG Unmounting /run/cos/active/sys from chroot
2024-04-24T20:21:36Z DBG Unmounting /run/cos/active/proc from chroot
2024-04-24T20:21:36Z DBG Unmounting /run/cos/active/dev/pts from chroot
2024-04-24T20:21:36Z DBG Unmounting /run/cos/active/dev from chroot
2024-04-24T20:21:36Z DBG Extra mounts: map[/run/cos/oem:/oem /run/cos/persistent:/usr/local]
2024-04-24T20:21:36Z DBG Mounting /dev to chroot
2024-04-24T20:21:36Z DBG Mounted /dev to /run/cos/active/dev
2024-04-24T20:21:36Z DBG Mounting /dev/pts to chroot
2024-04-24T20:21:36Z DBG Mounted /dev/pts to /run/cos/active/dev/pts
2024-04-24T20:21:36Z DBG Mounting /proc to chroot
2024-04-24T20:21:36Z DBG Mounted /proc to /run/cos/active/proc
2024-04-24T20:21:36Z DBG Mounting /sys to chroot
2024-04-24T20:21:36Z DBG Mounted /sys to /run/cos/active/sys
2024-04-24T20:21:36Z DBG Mounting /run/cos/oem to chroot
2024-04-24T20:21:36Z DBG Mounted /run/cos/oem to /run/cos/active/oem
2024-04-24T20:21:36Z DBG Mounting /run/cos/persistent to chroot
2024-04-24T20:21:36Z DBG Mounted /run/cos/persistent to /run/cos/active/usr/local
2024-04-24T20:21:36Z INF Running after-install-chroot hook
2024-04-24T20:21:36Z DBG Cloud-init paths set to [/system/oem /oem/ /usr/local/cloud-config/ /tmp/kairos-install-config-xxx.yaml223218110]
2024-04-24T20:21:36Z INF Running stage: after-install-chroot.before

2024-04-24T20:21:36Z INF Done executing stage 'after-install-chroot.before'

2024-04-24T20:21:36Z INF Running stage: after-install-chroot

2024-04-24T20:21:36Z INF Processing stage step 'Create data directories'. ( commands: 3, files: 0, ... )
2024-04-24T20:21:36Z INF Command output: 2024-04-24 20:21:36 ****** Option: make_directory, Disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0, Directory /var/lib/rancher/rke2 ******
2024-04-24 20:21:36   The /var/lib/rancher/rke2 directory already exists, skipping creation

sh: line 1: [[: 0[0]: syntax error: invalid arithmetic operator (error token is "[0]")

2024-04-24T20:21:36Z INF Command output: 2024-04-24 20:21:36 ****** Option: make_directory, Disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0, Directory /run/k3s ******
2024-04-24 20:21:36   Creating the /run/k3s directory

sh: line 1: [[: 0[0]: syntax error: invalid arithmetic operator (error token is "[0]")

2024-04-24T20:21:36Z INF Command output: 2024-04-24 20:21:36 ****** Option: make_directory, Disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0, Directory /var/lib/rancher/longhorn ******
2024-04-24 20:21:36   Creating the /var/lib/rancher/longhorn directory

sh: line 1: [[: 0[0]: syntax error: invalid arithmetic operator (error token is "[0]")

2024-04-24T20:21:36Z INF Processing stage step 'Format disks'. ( commands: 3, files: 0, ... )
2024-04-24T20:21:36Z INF Processing stage step 'Format disks'. ( commands: 3, files: 0, ... )
2024-04-24T20:21:36Z INF Command output: 2024-04-24 20:21:36 ****** Option: format_disk, Disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0, Directory /var/lib/rancher/rke2 ******
2024-04-24 20:21:36   Status before format.
2024-04-24 20:21:36   Formatting disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0 with label RKE2
mke2fs 1.46.5 (30-Dec-2021)
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0 is apparently in use by the system; will not make a filesystem here!
sh: line 1: [[: 1[0]: syntax error: invalid arithmetic operator (error token is "[0]")

2024-04-24T20:21:36Z INF Command output: 2024-04-24 20:21:36 ****** Option: format_disk, Disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0, Directory /var/lib/rancher/rke2 ******
2024-04-24 20:21:36   Status before format.
2024-04-24 20:21:36   Formatting disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0 with label RKE2
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done
Creating filesystem with 26214400 4k blocks and 6553600 inodes
Filesystem UUID: 5bdfccd6-0f43-4bc4-b7eb-1e933beca3e8
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done
Writing inode tables: done
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done

2024-04-24 20:21:36   Status after format
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0: LABEL="RKE2" UUID="5bdfccd6-0f43-4bc4-b7eb-1e933beca3e8" TYPE="ext4"

sh: line 1: [[: 0[0]: syntax error: invalid arithmetic operator (error token is "[0]")

2024-04-24T20:21:37Z INF Command output: 2024-04-24 20:21:36 ****** Option: format_disk, Disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0, Directory /run/k3s ******
2024-04-24 20:21:36   Status before format.
2024-04-24 20:21:37   Formatting disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0 with label K3S
mke2fs 1.46.5 (30-Dec-2021)
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0 is apparently in use by the system; will not make a filesystem here!
sh: line 1: [[: 1[0]: syntax error: invalid arithmetic operator (error token is "[0]")

2024-04-24T20:21:37Z INF Command output: 2024-04-24 20:21:36 ****** Option: format_disk, Disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0, Directory /run/k3s ******
2024-04-24 20:21:36   Status before format.
2024-04-24 20:21:36   Formatting disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0 with label K3S
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done
Creating filesystem with 39321600 4k blocks and 9830400 inodes
Filesystem UUID: 8c4e74ae-cf72-4ef5-bcac-8f24a4b9a026
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done

2024-04-24 20:21:37   Status after format
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0: LABEL="K3S" UUID="8c4e74ae-cf72-4ef5-bcac-8f24a4b9a026" TYPE="ext4"

sh: line 1: [[: 0[0]: syntax error: invalid arithmetic operator (error token is "[0]")

2024-04-24T20:21:37Z INF Command output: 2024-04-24 20:21:37 ****** Option: format_disk, Disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0, Directory /var/lib/rancher/longhorn ******
2024-04-24 20:21:37   Status before format.
2024-04-24 20:21:37   Formatting disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0 with label LONGHORN
mke2fs 1.46.5 (30-Dec-2021)
The file /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0 does not exist and no size was specified.
sh: line 1: [[: 1[0]: syntax error: invalid arithmetic operator (error token is "[0]")

2024-04-24T20:21:37Z INF Command output: 2024-04-24 20:21:37 ****** Option: format_disk, Disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0, Directory /var/lib/rancher/longhorn ******
2024-04-24 20:21:37   Status before format.
2024-04-24 20:21:37   Formatting disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0 with label LONGHORN
mke2fs 1.46.5 (30-Dec-2021)
The file /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0 does not exist and no size was specified.
sh: line 1: [[: 1[0]: syntax error: invalid arithmetic operator (error token is "[0]")

2024-04-24T20:21:37Z INF Processing stage step 'Create data directories'. ( commands: 3, files: 0, ... )
2024-04-24T20:21:37Z INF Command output: 2024-04-24 20:21:37 ****** Option: make_directory, Disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0, Directory /var/lib/rancher/rke2 ******
2024-04-24 20:21:37   The /var/lib/rancher/rke2 directory already exists, skipping creation

sh: line 1: [[: 0[0]: syntax error: invalid arithmetic operator (error token is "[0]")

2024-04-24T20:21:37Z INF Command output: 2024-04-24 20:21:37 ****** Option: make_directory, Disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0, Directory /run/k3s ******
2024-04-24 20:21:37   The /run/k3s directory already exists, skipping creation

sh: line 1: [[: 0[0]: syntax error: invalid arithmetic operator (error token is "[0]")

2024-04-24T20:21:37Z INF Command output: 2024-04-24 20:21:37 ****** Option: make_directory, Disk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0, Directory /var/lib/rancher/longhorn ******
2024-04-24 20:21:37   The /var/lib/rancher/longhorn directory already exists, skipping creation

sh: line 1: [[: 0[0]: syntax error: invalid arithmetic operator (error token is "[0]")

2024-04-24T20:21:37Z INF Done executing stage 'after-install-chroot'

2024-04-24T20:21:37Z INF Running stage: after-install-chroot.after

2024-04-24T20:21:37Z INF Done executing stage 'after-install-chroot.after'

2024-04-24T20:21:37Z INF Running stage: after-install-chroot.before

2024-04-24T20:21:37Z INF Done executing stage 'after-install-chroot.before'

2024-04-24T20:21:37Z INF Running stage: after-install-chroot

2024-04-24T20:21:37Z INF Done executing stage 'after-install-chroot'

2024-04-24T20:21:37Z INF Running stage: after-install-chroot.after

2024-04-24T20:21:37Z INF Done executing stage 'after-install-chroot.after'

2024-04-24T20:21:37Z DBG Unmounting /run/cos/active/usr/local from chroot
2024-04-24T20:21:37Z DBG Unmounting /run/cos/active/oem from chroot
2024-04-24T20:21:37Z DBG Unmounting /run/cos/active/sys from chroot
2024-04-24T20:21:37Z DBG Unmounting /run/cos/active/proc from chroot
2024-04-24T20:21:37Z DBG Unmounting /run/cos/active/dev/pts from chroot
2024-04-24T20:21:37Z DBG Unmounting /run/cos/active/dev from chroot
2024-04-24T20:21:37Z DBG Unmounting image COS_ACTIVE
2024-04-24T20:21:38Z DBG Running cmd: 'losetup -d /dev/loop1'
2024-04-24T20:21:38Z INF Copying /run/cos/state/cOS/active.img source to /run/cos/recovery/cOS/recovery.img
2024-04-24T20:21:39Z INF Finished copying /run/cos/state/cOS/active.img into /run/cos/recovery/cOS/recovery.img
2024-04-24T20:21:39Z DBG Running cmd: 'tune2fs -L COS_SYSTEM /run/cos/recovery/cOS/recovery.img'
2024-04-24T20:21:40Z DBG Not unmounting image,  doesn't look like mountpoint
2024-04-24T20:21:40Z INF Copying /run/cos/state/cOS/active.img source to /run/cos/state/cOS/passive.img
2024-04-24T20:21:41Z INF Finished copying /run/cos/state/cOS/active.img into /run/cos/state/cOS/passive.img
2024-04-24T20:21:41Z DBG Running cmd: 'tune2fs -L COS_PASSIVE /run/cos/state/cOS/passive.img'
2024-04-24T20:21:41Z DBG Not unmounting image,  doesn't look like mountpoint
2024-04-24T20:21:41Z INF Running after-install hook
2024-04-24T20:21:41Z DBG Cloud-init paths set to [/system/oem /oem/ /usr/local/cloud-config/ /tmp/kairos-install-config-xxx.yaml223218110]
2024-04-24T20:21:41Z DBG Failed creating cloud-init config path: /tmp/kairos-install-config-xxx.yaml223218110 mkdir /tmp/kairos-install-config-xxx.yaml223218110: not a directory
2024-04-24T20:21:41Z INF Running stage: after-install.before

2024-04-24T20:21:41Z INF Done executing stage 'after-install.before'

2024-04-24T20:21:41Z INF Running stage: after-install

2024-04-24T20:21:41Z INF Processing stage step 'Mount state'. ( commands: 1, files: 0, ... )
2024-04-24T20:21:41Z INF Command output:
2024-04-24T20:21:41Z INF Processing stage step 'Hook boot assessment grub configuration'. ( commands: 1, files: 0, ... )
2024-04-24T20:21:41Z INF Command output:
2024-04-24T20:21:41Z INF Processing stage step 'Add boot assessment grub configuration'. ( commands: 0, files: 1, ... )
2024-04-24T20:21:41Z INF Processing stage step 'Grub branding'. ( commands: 1, files: 0, ... )
2024-04-24T20:21:41Z INF Command output: '/etc/kairos/branding/grubmenu.cfg' -> '/tmp/mnt/STATE/grubmenu'

2024-04-24T20:21:41Z INF Processing stage step 'umount state'. ( commands: 1, files: 0, ... )
2024-04-24T20:21:41Z INF Command output:
2024-04-24T20:21:41Z INF Done executing stage 'after-install'

2024-04-24T20:21:41Z INF Running stage: after-install.after

2024-04-24T20:21:41Z INF Done executing stage 'after-install.after'

2024-04-24T20:21:41Z INF Running stage: after-install.before

2024-04-24T20:21:41Z INF Done executing stage 'after-install.before'

2024-04-24T20:21:41Z INF Running stage: after-install

2024-04-24T20:21:41Z INF Done executing stage 'after-install'

2024-04-24T20:21:41Z INF Running stage: after-install.after

2024-04-24T20:21:41Z INF Done executing stage 'after-install.after'

2024-04-24T20:21:41Z DBG Not unmounting image, /run/cos/active doesn't look like mountpoint
2024-04-24T20:21:41Z INF Unmounting disk partitions
2024-04-24T20:21:41Z DBG Unmounting partition COS_STATE
2024-04-24T20:21:42Z DBG Unmounting partition COS_RECOVERY
2024-04-24T20:21:42Z DBG Unmounting partition COS_PERSISTENT
2024-04-24T20:21:42Z DBG Unmounting partition COS_OEM
2024-04-24T20:21:42Z DBG Running cmd: 'cat /proc/cmdline'
2024-04-24T20:21:42Z DBG Cloud-init paths set to [/system/oem /oem/ /usr/local/cloud-config/ /tmp/kairos-install-config-xxx.yaml223218110]
2024-04-24T20:21:42Z DBG Failed creating cloud-init config path: /tmp/kairos-install-config-xxx.yaml223218110 mkdir /tmp/kairos-install-config-xxx.yaml223218110: not a directory
2024-04-24T20:21:42Z INF Running stage: kairos-install.after.before

2024-04-24T20:21:42Z INF Done executing stage 'kairos-install.after.before'

2024-04-24T20:21:42Z INF Running stage: kairos-install.after

2024-04-24T20:21:42Z INF Done executing stage 'kairos-install.after'

2024-04-24T20:21:42Z INF Running stage: kairos-install.after.after

2024-04-24T20:21:42Z INF Done executing stage 'kairos-install.after.after'

2024-04-24T20:21:42Z INF Running stage: kairos-install.after.before

2024-04-24T20:21:42Z INF Done executing stage 'kairos-install.after.before'

2024-04-24T20:21:42Z INF Running stage: kairos-install.after

2024-04-24T20:21:42Z INF Done executing stage 'kairos-install.after'

2024-04-24T20:21:42Z INF Running stage: kairos-install.after.after

2024-04-24T20:21:42Z INF Done executing stage 'kairos-install.after.after'

2024-04-24T20:21:42Z DBG Running GrubOptions hook
2024-04-24T20:21:42Z DBG Setting grub options: map[extra_cmdline:rd.immucore.debug]
2024-04-24T20:21:42Z DBG Finish GrubOptions hook
2024-04-24T20:21:42Z DBG Running BundlePostInstall hook
2024-04-24T20:21:42Z DBG Finish BundlePostInstall hook
2024-04-24T20:21:42Z DBG Running CustomMounts hook
2024-04-24T20:21:42Z DBG Finish CustomMounts hook
2024-04-24T20:21:42Z DBG Running CopyLogs hook
2024-04-24T20:21:42Z DBG Copying logs to persistent partition
2024-04-24T20:21:42Z INF Starting rsync...
2024-04-24T20:21:42Z DBG Running cmd: 'rsync --progress --partial --human-readable --archive --xattrs --acls /var/log/ /run/cos/persistent/.state/var-log.bind/'
2024-04-24T20:21:42Z INF Finished syncing
2024-04-24T20:21:42Z DBG Logs copied to persistent partition
2024-04-24T20:21:42Z DBG Finish CopyLogs hook
2024-04-24T20:21:42Z DBG Running Lifecycle hook
2024-04-24T20:21:42Z DBG Finish Lifecycle hook
jimmykarily commented 6 months ago

Multiple directories get scanned when the kairos-agent runs: https://github.com/kairos-io/kairos-agent/blob/2b99bf045becc9c389602a6fcace0284afc4b8ce/pkg/constants/constants.go#L168

The files in those directories are filtered by yaml extension and valid header and they are merged into one config. You can see the result of that merge at the beginning of the installation logs.

In that merged struct, I see this:

      "after-install-chroot": []interface {}{
        collector.Config{
          "commands": []interface {}{
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0\" \"/var/lib/rancher/rke2\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0\" \"/run/k3s\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
            "make_disk.sh \"make_directory\" \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0\" \"/var/lib/rancher/longhorn\" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi",
          },
          "name": "Create data directories",
        },

which seems to originate in the Aurora boot config you attached. This means, it's being read.

This block exists too:

      "kairos-install.pre.before": []interface {}{
        collector.Config{
          "commands": []interface {}{
            "parted --script --machine -- \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0\" mklabel gpt\n# Legacy bios\nsgdisk --new=1:2048:+1M --change-name=1:'bios' --typecode=1:EF02 \"/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0\"\n",
          },

So, although there is a lot happening here and I could easily miss something important, it seems that both configs are merged in the final one.

The installation output even shows partitions being created:

2024-04-24T20:21:10Z INF Processing stage step 'Create partitions'. ( commands: 1, files: 0, ... )
2024-04-24T20:21:11Z INF Command output: Setting name!
partNum is 0
The operation has completed successfully.

(I'm not sure where the "Setting name!" text is coming from)

In the installation logs above there are some errors (not necessarily explaining the original issue). E.g.:

invalid arithmetic operator (error token is "[0]")

maybe you can tell where these are coming from?

jimmykarily commented 6 months ago

(I'm not sure where the "Setting name!" text is coming from)

it's from the sgdisk command:

        sgdisk --new=1:2048:+1M --change-name=1:'bios' --typecode=1:EF02 "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
sarg3nt commented 6 months ago

Ohh, I see, the reason I couldn't find the cloud_init.yaml from AuroraBoot is because it doesn't pull and store it. It pulls it during runtime. From the logs: "config_url": "http://10.105.148.91:8090/_/file?name=other-1", I found a super easy way of running the manual install without copying the file over.
Since it's already being pulled during any install just do this

echo "#cloud-config" > /tmp/config.yaml
kairos-agent manual-install /tmp/config.yaml 2>&1 | tee /tmp/out.log

The temp file is required or it won't run, but it doesn't really need anything in it.

Re: my question above.

The bigger surprise is that cloud_init.yaml being served from the target nodes vSphere guestinfo.userdata is being ran even when auto: false is set. I assumed it would not, but it is.

Is that expected or a bug?

jimmykarily commented 6 months ago

The auto: false command only controls whether the installation of kairos will start automatically or not. It doesn't prevent stages from being run or configs from being parsed. That said, the installation in your case indeed started so it looks like a bug to me. Unless I don't understand the auto setting either :D. @kairos-io/maintainers do you see any reason why the installation would start when auto is set to false? Maybe Auroraboot somehow forces it? Through cmdline maybe? I'm just throwing ideas here.

Itxaka commented 6 months ago

umm, no, I cant understand why would the install auto start if the install.auto is set to false....

cmdline in aurora is not supposed to start the install either if the auto is set to false.

Maybe we got a bug around that?

jimmykarily commented 6 months ago

Could be. I'll open another ticket for this since this one was about custom partitioning. Here: https://github.com/kairos-io/kairos/issues/2516

@sarg3nt I'm closing this. Let's move the auto: false conversation to the new ticket.