kairos-io / kairos

The immutable Linux meta-distribution for edge Kubernetes.
https://kairos.io
Apache License 2.0
1.14k stars 97 forks source link

feat(AuroraBoot): support netbooting with UEFI #2529

Open sarg3nt opened 6 months ago

sarg3nt commented 6 months ago

Kairos version:

A build from Master from kairos/rockylinux:9-core-amd64-generic-master on April 29th, 2024

CPU architecture, OS, and Version:

Intel, vSphere VM, rocky linux 9 core with add-ons.

Describe the bug

Works when VM is in BIOS mode but when switched to EFI (VSpheres term for the firmware) it crashes on install.

To Reproduce

You don't have access to vSphere so . . . . let me be your hands. :)

Expected behavior

It works?

Additional context

Logs

Debug mode is enabled with

  grub_options:
    extra_cmdline: "rd.immucore.debug"

AuroraBoot

AuroraBoot is serving all the files properly and it looks like the downstream VM is getting them.

2024/05/02 18:03:59 DHCP: Offering to boot 00:50:56:b6:35:a0
2024/05/02 18:04:03 TFTP: Send of "00:50:56:b6:35:a0/2" to 10.105.148.75:1850 failed: "10.105.148.75:1850": sending OACK: client aborted transfer: User aborted the transfer
2024/05/02 18:04:03 TFTP: Sent "00:50:56:b6:35:a0/2" to 10.105.148.75:1851
2024/05/02 18:04:07 DHCP: Offering to boot 00:50:56:b6:35:a0
2024/05/02 18:04:07 HTTP: Sending ipxe boot script to 10.105.148.75:34702
2024/05/02 18:04:07 HTTP: Sent file "kernel" to 10.105.148.75:34702
2024/05/02 18:04:08 HTTP: Sent file "initrd-0" to 10.105.148.75:34702
2024/05/02 18:04:14 HTTP: Sent file "other-0" to 10.105.148.68:53094
2024/05/02 18:04:22 HTTP: Sent file "other-1" to 10.105.148.75:39452

Node is stopping on startup:

image

Setting auto: false and reboot: false does not change anything but I'm able to SSH to the partially booted node. Some customization's are being ran. i.e. the machine name is being set by the cloud_init.yaml file that is served to the node via vSphere customization. From first inspection it looks like initramfs is being ran but the node password is not being set, nor are the boot or write_files steps. See: https://github.com/kairos-io/kairos/issues/2281#issuecomment-2078014965 The files in that post are the relevant bits. and https://github.com/kairos-io/kairos/issues/2516 for context.

journalctl -u kairos-agent

[root@lpul-vault-k8s-server-0 immucore]# journalctl -u kairos-agent
May 02 19:23:08 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: Started kairos agent.
May 02 19:23:08 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1354]: warning: skipping /oem/userdata (extension).
May 02 19:23:08 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1354]: 2024-05-02T19:23:08Z INF Kairos Agent version=v2.9.1
May 02 19:23:08 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1354]: 2024-05-02T19:23:08Z DBG Kairos Agent version={"git_commit":"none","go_version":"go1.21.7","versi>
May 02 19:23:08 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1354]: 2024-05-02T19:23:08Z INF Kairos System version=v3.0.4-48-g2efd774
May 02 19:23:08 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1354]: 2024-05-02T19:23:08Z INF creating a runtime
May 02 19:23:08 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1354]: 2024-05-02T19:23:08Z INF detecting boot state
May 02 19:23:08 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1354]: 2024/05/02 19:23:08 Not implemented signature list certificate:
May 02 19:23:08 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: kairos-agent.service: Main process exited, code=exited, status=1/FAILURE
May 02 19:23:08 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: kairos-agent.service: Failed with result 'exit-code'.
May 02 19:23:14 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: kairos-agent.service: Scheduled restart job, restart counter is at 1.
May 02 19:23:14 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: Stopped kairos agent.
May 02 19:23:14 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: Started kairos agent.
May 02 19:23:14 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1369]: warning: skipping /oem/userdata (extension).
May 02 19:23:14 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1369]: 2024-05-02T19:23:14Z INF Kairos Agent version=v2.9.1
May 02 19:23:14 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1369]: 2024-05-02T19:23:14Z DBG Kairos Agent version={"git_commit":"none","go_version":"go1.21.7","versi>
May 02 19:23:14 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1369]: 2024-05-02T19:23:14Z INF Kairos System version=v3.0.4-48-g2efd774
May 02 19:23:14 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1369]: 2024-05-02T19:23:14Z INF creating a runtime
May 02 19:23:14 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1369]: 2024-05-02T19:23:14Z INF detecting boot state
May 02 19:23:14 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1369]: 2024/05/02 19:23:14 Not implemented signature list certificate:
May 02 19:23:14 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: kairos-agent.service: Main process exited, code=exited, status=1/FAILURE
May 02 19:23:14 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: kairos-agent.service: Failed with result 'exit-code'.
May 02 19:23:19 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: kairos-agent.service: Scheduled restart job, restart counter is at 2.
May 02 19:23:19 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: Stopped kairos agent.
May 02 19:23:19 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: Started kairos agent.
May 02 19:23:19 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1379]: warning: skipping /oem/userdata (extension).
May 02 19:23:19 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1379]: 2024-05-02T19:23:19Z INF Kairos Agent version=v2.9.1
May 02 19:23:19 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1379]: 2024-05-02T19:23:19Z DBG Kairos Agent version={"git_commit":"none","go_version":"go1.21.7","versi>
May 02 19:23:19 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1379]: 2024-05-02T19:23:19Z INF Kairos System version=v3.0.4-48-g2efd774
May 02 19:23:19 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1379]: 2024-05-02T19:23:19Z INF creating a runtime
May 02 19:23:19 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1379]: 2024-05-02T19:23:19Z INF detecting boot state
May 02 19:23:19 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1379]: 2024/05/02 19:23:19 Not implemented signature list certificate:
May 02 19:23:19 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: kairos-agent.service: Main process exited, code=exited, status=1/FAILURE
May 02 19:23:19 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: kairos-agent.service: Failed with result 'exit-code'.
May 02 19:23:24 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: kairos-agent.service: Scheduled restart job, restart counter is at 3.
May 02 19:23:24 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: Stopped kairos agent.
<repeats>

Logs in /run/kairos

There are a bunch of log files in /run/kairos but as far as I can tell they all say the same thing.

2024-05-02T19:29:10Z INF Kairos Agent version=v2.9.1
2024-05-02T19:29:10Z DBG Kairos Agent version={"git_commit":"none","go_version":"go1.21.7","version":"v2.9.1"}
2024-05-02T19:29:10Z INF Kairos System version=v3.0.4-48-g2efd774
2024-05-02T19:29:10Z INF creating a runtime
2024-05-02T19:29:10Z INF detecting boot state
[root@lpul-vault-k8s-server-0 kairos]# cat agent-20240502192720.7386.log
2024-05-02T19:27:20Z INF Kairos Agent version=v2.9.1
2024-05-02T19:27:20Z DBG Kairos Agent version={"git_commit":"none","go_version":"go1.21.7","version":"v2.9.1"}
2024-05-02T19:27:20Z INF Kairos System version=v3.0.4-48-g2efd774
2024-05-02T19:27:20Z INF creating a runtime
2024-05-02T19:27:20Z INF detecting boot state

/run/immucore/immucore.log

[root@lpul-vault-k8s-server-0 immucore]#  cat immucore.log
2024-05-02T19:23:00Z INF Immucore commit=none compiled with=go1.21.7 version=v0.1.25
2024-05-02T19:23:00Z INF creating a runtime
2024-05-02T19:23:00Z INF detecting boot state
2024-05-02T19:23:00Z INF Stanza rd.cos.disable/rd.immucore.disable on the cmdline or booting from CDROM/Netboot/Squash recovery. Disabling immucore.
2024-05-02T19:23:00Z INF 1.
 <init> (background: false) (weak: false) (run: false)
2.
 <create-sentinel> (background: false) (weak: false) (run: false)
 <wait-for-sysroot> (background: false) (weak: false) (run: false)
3.
 <mount-oem> (background: false) (weak: false) (run: false)
4.
 <rootfs-hook> (background: false) (weak: false) (run: false)
5.
 <initramfs-hook> (background: false) (weak: false) (run: false)

2024-05-02T19:23:00Z INF creating a runtime
2024-05-02T19:23:00Z INF detecting boot state
2024-05-02T19:23:00Z INF Setting sentinel file to=live_mode
2024-05-02T19:23:04Z INF creating a runtime
2024-05-02T19:23:04Z INF detecting boot state
2024-05-02T19:23:04Z INF Running rootfs stage
2024-05-02T19:23:06Z INF Running initramfs stage
2024-05-02T19:23:07Z INF 1.
 <init> (background: false) (weak: false) (run: false)
2.
 <create-sentinel> (background: false) (weak: false) (run: true)
 <wait-for-sysroot> (background: false) (weak: false) (run: true)
3.
 <mount-oem> (background: false) (weak: false) (run: true)
4.
 <rootfs-hook> (background: false) (weak: false) (run: true)
5.
 <initramfs-hook> (background: false) (weak: false) (run: true)

/run/immucore/initramfs_stage.log

[root@lpul-vault-k8s-server-0 immucore]# cat initramfs_stage.log
2024-05-02T19:23:06Z INF Running stage: initramfs.before

2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f /oem/userdata ]: exit status 1)' stage name: Pull data from provider
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e /sbin/openrc ]: exit status 1)' stage name: Blacklist bpfilter on Alpine ( bug: https://github.com/kairos-io/kairos/issues/277 )
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run ! [[ -f /etc/hosts ]] || ! [[ $(grep '127.0.0.1' /etc/hosts) ]]
: exit status 1)' stage name: Make sure hosts file is present and includes a record for 127.0.0.1
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f /oem/userdata ]: exit status 1)' stage name:
2024-05-02T19:23:06Z INF Done executing stage 'initramfs.before'

2024-05-02T19:23:06Z INF Running stage: initramfs

2024-05-02T19:23:06Z INF Processing stage step ''. ( commands: 1, files: 0, ... )
2024-05-02T19:23:06Z INF Processing stage step 'Create journalctl /var/log/journal dir'. ( commands: 0, files: 0, ... )
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/sbin/openrc" ]
: exit status 1)' stage name: Create OpenRC services
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.remote_recovery_mode" /proc/cmdline && \
( [ -e "/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] || [ -e "/usr/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] )
: exit status 1)' stage name: Starts kairos-recovery and generate a temporary pass
2024-05-02T19:23:06Z INF Processing stage step 'Enable systemd-network config files for DHCP'. ( commands: 1, files: 2, ... )
2024-05-02T19:23:06Z INF Processing stage step 'systemd-sysext initramfs settings'. ( commands: 0, files: 0, ... )
2024-05-02T19:23:06Z ERR Failed to connect system bus: No such file or directory
: failed to run networkctl reload: exit status 1
2024-05-02T19:23:06Z ERR 1 error occurred:
        * failed to run networkctl reload: exit status 1

2024-05-02T19:23:06Z INF Command output: Created symlink /etc/systemd/system/multi-user.target.wants/kairos-agent.service → /etc/systemd/system/kairos-agent.service.

2024-05-02T19:23:06Z INF Processing stage step 'Disable NetworkManager and wicked'. ( commands: 0, files: 0, ... )
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f "/run/cos/recovery_mode" ] && [ ! -f "/run/cos/live_mode" ]: exit status 1)' stage name:
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.remote_recovery_mode" /proc/cmdline && [ -f "/sbin/openrc" ]: exit status 1)' stage name: Starts kairos-recovery for openRC based systems
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/sbin/openrc" ]
: exit status 1)' stage name: Enable OpenRC services
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f "/run/cos/recovery_mode" ] && [ -s /usr/local/etc/machine-id ]: exit status 1)' stage name: Restore /etc/machine-id for systemd systems
2024-05-02T19:23:06Z INF Processing stage step ''. ( commands: 0, files: 2, ... )
2024-05-02T19:23:06Z ERR 2 errors occurred:
        * failed to run systemctl disable NetworkManager: exit status 1
        * failed to run systemctl disable wicked: exit status 1

2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f "/run/cos/recovery_mode" ] && [ -f "/sbin/openrc" ]: exit status 1)' stage name: Restore /etc/machine-id for openrc systems
2024-05-02T19:23:06Z INF Processing stage step 'Default systemd config'. ( commands: 1, files: 0, ... )
2024-05-02T19:23:06Z INF Processing stage step 'Enable systemd-network and systemd-resolved'. ( commands: 0, files: 0, ... )
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -qv "interactive-install" /proc/cmdline || grep -qv "install-mode-interactive" /proc/cmdline) && \
[ -f /run/cos/live_mode ] && \
[ -f "/sbin/openrc" ]
: exit status 1)' stage name: Autologin on livecd for OpenRC
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -q "kairos.reset" /proc/cmdline || [ -f /run/cos/autoreset_mode ]) && \
( [ -e "/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] || [ -e "/usr/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] )
: exit status 1)' stage name: Starts kairos-reset for systemd based systems
2024-05-02T19:23:06Z INF Command output: Created symlink /etc/systemd/system/default.target → /usr/lib/systemd/system/multi-user.target.

2024-05-02T19:23:06Z ERR 5 errors occurred:
        * failed to run systemctl enable systemd-timesyncd: exit status 1
        * failed to run systemctl enable nohang: exit status 1
        * failed to run systemctl enable nohang-desktop: exit status 1
        * failed to run systemctl enable fail2ban: exit status 1
        * failed to run systemctl enable logrotate.timer: exit status 1

2024-05-02T19:23:06Z INF Processing stage step 'Generate host keys'. ( commands: 1, files: 0, ... )
2024-05-02T19:23:06Z INF Processing stage step 'Link /etc/resolv.conf to systemd resolv.conf'. ( commands: 2, files: 0, ... )
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.reset" /proc/cmdline && [ -f "/sbin/openrc" ]: exit status 1)' stage name: Starts kairos-reset for openRC-based systems
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run cat /proc/cmdline | grep "selinux=1"
: exit status 1)' stage name: Relabelling
2024-05-02T19:23:06Z INF Command output:
2024-05-02T19:23:06Z INF Command output:
2024-05-02T19:23:07Z INF Command output: ssh-keygen: generating new host keys: RSA DSA ECDSA ED25519

2024-05-02T19:23:07Z INF Processing stage step 'Create systemd services'. ( commands: 0, files: 5, ... )
2024-05-02T19:23:07Z INF Processing stage step ''. ( commands: 5, files: 0, ... )
2024-05-02T19:23:07Z INF Command output: Removed "/etc/systemd/system/getty.target.wants/getty@tty1.service".

2024-05-02T19:23:07Z INF Command output: Running in chroot, ignoring command 'stop'

2024-05-02T19:23:07Z INF Command output: Created symlink /etc/systemd/system/getty@tty1.service → /dev/null.

2024-05-02T19:23:07Z INF Command output: Created symlink /etc/systemd/system/multi-user.target.wants/kairos.service → /etc/systemd/system/kairos.service.

2024-05-02T19:23:07Z INF Command output: Created symlink /etc/systemd/system/multi-user.target.wants/kairos-webui.service → /etc/systemd/system/kairos-webui.service.

2024-05-02T19:23:07Z INF Processing stage step 'Enable systemd services'. ( commands: 4, files: 0, ... )
2024-05-02T19:23:07Z INF Command output:
2024-05-02T19:23:07Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -q "install-mode" /proc/cmdline || grep -q "nodepair.enable" /proc/cmdline ) && \
([ -f /run/cos/live_mode ] || [ -f /run/cos/uki_install_mode ]) && \
[ -f "/sbin/openrc" ]
: exit status 1)' stage name:
2024-05-02T19:23:07Z INF Command output:
2024-05-02T19:23:07Z INF Command output:
2024-05-02T19:23:07Z INF Command output:
2024-05-02T19:23:07Z INF Processing stage step 'Setup groups'. ( commands: 0, files: 0, ... )
2024-05-02T19:23:07Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -q "interactive-install" /proc/cmdline || grep -q "install-mode-interactive" /proc/cmdline) && \
([ -f /run/cos/live_mode ] || [ -f /run/cos/uki_install_mode ]) && \
( [ -e "/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] || [ -e "/usr/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] )
: exit status 1)' stage name:
2024-05-02T19:23:07Z INF Processing stage step 'Setup users'. ( commands: 0, files: 0, ... )
2024-05-02T19:23:07Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -q "interactive-install" /proc/cmdline || grep -q "install-mode-interactive" /proc/cmdline) && \
([ -f /run/cos/live_mode ] || [ -f /run/cos/uki_install_mode ]) && \
[ -f "/sbin/openrc" ]
: exit status 1)' stage name:
2024-05-02T19:23:07Z INF Processing stage step 'Set user password if running in live or uki'. ( commands: 0, files: 0, ... )
2024-05-02T19:23:07Z INF Processing stage step 'Setup sudo'. ( commands: 1, files: 1, ... )
2024-05-02T19:23:07Z INF Command output: Locking password for user root.
passwd: Success

2024-05-02T19:23:07Z INF Processing stage step 'Ensure runtime permission'. ( commands: 2, files: 0, ... )
2024-05-02T19:23:07Z INF Command output:
2024-05-02T19:23:07Z INF Command output:
2024-05-02T19:23:07Z INF Processing stage step ''. ( commands: 0, files: 0, ... )
2024-05-02T19:23:07Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e "/usr/local/cloud-config" ]: exit status 1)' stage name: Ensure runtime permission
2024-05-02T19:23:07Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/sys/firmware/devicetree/base/model" ] && grep -i jetson "/sys/firmware/devicetree/base/model"
: exit status 1)' stage name: Create files
2024-05-02T19:23:07Z INF Processing stage step ''. ( commands: 0, files: 0, ... )
2024-05-02T19:23:07Z INF Processing stage step 'Set hostname'. ( commands: 0, files: 0, ... )
2024-05-02T19:23:07Z INF Processing stage step 'Run commands'. ( commands: 1, files: 0, ... )
2024-05-02T19:23:07Z INF Command output: 2024-05-02 19:23:07 Add DHCP ClientIdentifier=mac to network config if not already present.
2024-05-02 19:23:07   Adding line [DHCP] to file /etc/systemd/network/20-dhcp.network
2024-05-02 19:23:07   Adding line ClientIdentifier=mac to file /etc/systemd/network/20-dhcp.network
2024-05-02 19:23:07   Adding line [DHCP] to file /etc/systemd/network/20-dhcp-legacy.network
2024-05-02 19:23:07   Adding line ClientIdentifier=mac to file /etc/systemd/network/20-dhcp-legacy.network
2024-05-02 19:23:07 Add ll to the root and Kairos .bashrc if not already present.
2024-05-02 19:23:07   Adding line alias ll="ls -alh" to file /root/.bashrc
2024-05-02 19:23:07   Creating new file /home/kairos/.bashrc with line alias ll="ls -alh"
2024-05-02 19:23:07   Creating new file /home/kairos/.profile with line alias ll="ls -alh"
2024-05-02 19:23:07 Add rke2 bin to the path.
2024-05-02 19:23:07   Adding line export PATH="${PATH}:/var/lib/rancher/rke2/bin/" to file /root/.bashrc
2024-05-02 19:23:07   Adding line export PATH="${PATH}:/var/lib/rancher/rke2/bin/" to file /home/kairos/.bashrc
2024-05-02 19:23:07   Adding line export PATH="${PATH}:/var/lib/rancher/rke2/bin/" to file /home/kairos/.profile
/bin/sh: line 1: [[: 0[0]: syntax error: invalid arithmetic operator (error token is "[0]")

2024-05-02T19:23:07Z INF Done executing stage 'initramfs'

2024-05-02T19:23:07Z INF Running stage: initramfs.after

2024-05-02T19:23:07Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e /sbin/openrc ]: exit status 1)' stage name: Enable serial login for alpine
2024-05-02T19:23:07Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [[ $(kairos-agent state get kairos.flavor) =~ ^ubuntu ]]: exit status 1)' stage name: setupcon initramfs.after ubuntu
2024-05-02T19:23:07Z INF Done executing stage 'initramfs.after'

2024-05-02T19:23:07Z INF Running stage: initramfs.before

2024-05-02T19:23:07Z INF Done executing stage 'initramfs.before'

2024-05-02T19:23:07Z INF Running stage: initramfs

2024-05-02T19:23:07Z INF Done executing stage 'initramfs'

2024-05-02T19:23:07Z INF Running stage: initramfs.after

2024-05-02T19:23:07Z INF Done executing stage 'initramfs.after'

/run/immucore/rootfs_stage.log

[root@lpul-vault-k8s-server-0 immucore]# cat rootfs_stage.log
2024-05-02T19:23:04Z INF Running stage: rootfs.before

2024-05-02T19:23:04Z INF Processing stage step 'Pull data from provider'. ( commands: 0, files: 0, ... )
2024-05-02T19:23:04Z INF Processing stage step 'Enable systemd-network config files for DHCP'. ( commands: 1, files: 2, ... )
2024-05-02T19:23:04Z ERR mkdir /etc/systemd/network/: file exists
2024-05-02T19:23:04Z ERR 1 error occurred:
        * mkdir /etc/systemd/network/: file exists

2024-05-02T19:23:04Z ERR Failed to connect system bus: No such file or directory
: failed to run networkctl reload: exit status 1
2024-05-02T19:23:04Z ERR 1 error occurred:
        * failed to run networkctl reload: exit status 1

2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f /oem/userdata ]: exit status 1)' stage name: Sentinel file for userdata
2024-05-02T19:23:06Z INF Done executing stage 'rootfs.before'

2024-05-02T19:23:06Z INF Running stage: rootfs

2024-05-02T19:23:06Z INF Processing stage step 'Layout configuration for active/passive mode'. ( commands: 0, files: 0, ... )
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/run/cos/recovery_mode" ] && grep -vq "rd.immucore.uki" /proc/cmdline: exit status 1)' stage name: Layout configuration for recovery mode
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.boot_live_mode" /proc/cmdline: exit status 1)' stage name: Layout configuration for booting local node from livecd
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e "/run/cos/uki_boot_mode" ] && [ ! -e "/run/cos/recovery_mode" ] && [ ! -e "/run/cos/autoreset_mode" ]: exit status 1)' stage name: Layout configuration for UKI boot
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e "/run/cos/uki_boot_mode" ] && ([ -e "/run/cos/recovery_mode" ] || [ -e "/run/cos/autoreset_mode" ]): exit status 1)' stage name: Layout configuration for recovery/autoreset mode on UKI
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e "/run/cos/uki_install_mode" ]: exit status 1)' stage name: Layout configuration for UKI installer
2024-05-02T19:23:06Z INF Done executing stage 'rootfs'

2024-05-02T19:23:06Z INF Running stage: rootfs.after

2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -r /run/cos/custom-layout.env ] && [ ! -f "/run/cos/recovery_mode" ] && [ ! -f /run/cos/live_mode ]: exit status 1)' stage name: add custom bind and ephemeral mounts to /run/cos/cos-layout.env
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f /run/cos/recovery_mode ] && [ ! -f /run/cos/live_mode ] && [ -f "/sys/firmware/devicetree/base/model" ] && grep -i "Raspberry Pi 4" "/sys/firmware/devicetree/base/model": exit status 1)' stage name: Grow persistent
2024-05-02T19:23:06Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f /run/cos/recovery_mode ] && [ ! -f /run/cos/live_mode ]: exit status 1)' stage name: Grow persistent
2024-05-02T19:23:06Z INF Done executing stage 'rootfs.after'

2024-05-02T19:23:06Z INF Running stage: rootfs.before

2024-05-02T19:23:06Z INF Done executing stage 'rootfs.before'

2024-05-02T19:23:06Z INF Running stage: rootfs

2024-05-02T19:23:06Z INF Done executing stage 'rootfs'

2024-05-02T19:23:06Z INF Running stage: rootfs.after

2024-05-02T19:23:06Z INF Done executing stage 'rootfs.after'
ci-robbot commented 6 months ago

Based on the logs provided, it seems there is an issue with adding a line to the bashrc files on the system. Specifically, the line:

export PATH="${PATH}:/var/lib/rancher/rke2/bin/"

is failing to be added. This can be seen in the '/home/kairos/.bashrc' and '/home/kairos/.profile' files. The error message:

/bin/sh: line 1: [[: 0[0]: syntax error: invalid arithmetic operator (error token is "[0]")

indicates a syntax error in the bash script.

It is recommended to manually check the files '/home/kairos/.bashrc' and '/home/kairos/.profile' to confirm whether the line in question is present and if there are any syntax errors.

Additionally, ensure that the '/var/lib/rancher/rke2/bin/' directory exists and has executable files.

Once you have confirmed these details, you may remove the offending line from the bashrc files as it does not appear to be a necessary addition.

If the issue persists after these steps, further investigation into the code and environment may be needed.

sarg3nt commented 6 months ago

Regarding what @ci-robbot noticed. The lines that add the path statement are fine. Both /home/kairos/.bashrc and /home/kairos/.profile are fine. The problem was some left over logging code that was not working. I removed it and reran the test, did not help, which I figured would be the case as this was the same code that worked fine in bios mode.

ci-robbot commented 6 months ago

Could you provide the updated logs after removing the logging code and rerunning the test?

sarg3nt commented 6 months ago

@ci-robbot Okey dokey Mr bot sir.

[root@lpul-vault-k8s-server-0 immucore]# cat initramfs_stage.log
2024-05-03T00:35:19Z INF Running stage: initramfs.before

2024-05-03T00:35:19Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f /oem/userdata ]: exit status 1)' stage name: Pull data from provider
2024-05-03T00:35:19Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e /sbin/openrc ]: exit status 1)' stage name: Blacklist bpfilter on Alpine ( bug: https://github.com/kairos-io/kairos/issues/277 )
2024-05-03T00:35:19Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run ! [[ -f /etc/hosts ]] || ! [[ $(grep '127.0.0.1' /etc/hosts) ]]
: exit status 1)' stage name: Make sure hosts file is present and includes a record for 127.0.0.1
2024-05-03T00:35:19Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f /oem/userdata ]: exit status 1)' stage name:
2024-05-03T00:35:19Z INF Done executing stage 'initramfs.before'

2024-05-03T00:35:19Z INF Running stage: initramfs

2024-05-03T00:35:19Z INF Processing stage step ''. ( commands: 1, files: 0, ... )
2024-05-03T00:35:19Z INF Processing stage step 'Enable systemd-network config files for DHCP'. ( commands: 1, files: 2, ... )
2024-05-03T00:35:19Z INF Processing stage step 'systemd-sysext initramfs settings'. ( commands: 0, files: 0, ... )
2024-05-03T00:35:19Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.remote_recovery_mode" /proc/cmdline && \
( [ -e "/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] || [ -e "/usr/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] )
: exit status 1)' stage name: Starts kairos-recovery and generate a temporary pass
2024-05-03T00:35:19Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/sbin/openrc" ]
: exit status 1)' stage name: Create OpenRC services
2024-05-03T00:35:19Z INF Processing stage step 'Create journalctl /var/log/journal dir'. ( commands: 0, files: 0, ... )
2024-05-03T00:35:19Z ERR Failed to connect system bus: No such file or directory
: failed to run networkctl reload: exit status 1
2024-05-03T00:35:19Z ERR 1 error occurred:
        * failed to run networkctl reload: exit status 1

2024-05-03T00:35:19Z INF Command output: Created symlink /etc/systemd/system/multi-user.target.wants/kairos-agent.service → /etc/systemd/system/kairos-agent.service.

2024-05-03T00:35:19Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f "/run/cos/recovery_mode" ] && [ -s /usr/local/etc/machine-id ]: exit status 1)' stage name: Restore /etc/machine-id for systemd systems
2024-05-03T00:35:19Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f "/run/cos/recovery_mode" ] && [ ! -f "/run/cos/live_mode" ]: exit status 1)' stage name:
2024-05-03T00:35:19Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/sbin/openrc" ]
: exit status 1)' stage name: Enable OpenRC services
2024-05-03T00:35:19Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.remote_recovery_mode" /proc/cmdline && [ -f "/sbin/openrc" ]: exit status 1)' stage name: Starts kairos-recovery for openRC based systems
2024-05-03T00:35:19Z INF Processing stage step 'Disable NetworkManager and wicked'. ( commands: 0, files: 0, ... )
2024-05-03T00:35:19Z INF Processing stage step ''. ( commands: 0, files: 2, ... )
2024-05-03T00:35:19Z ERR 2 errors occurred:
        * failed to run systemctl disable NetworkManager: exit status 1
        * failed to run systemctl disable wicked: exit status 1

2024-05-03T00:35:19Z INF Processing stage step 'Enable systemd-network and systemd-resolved'. ( commands: 0, files: 0, ... )
2024-05-03T00:35:19Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f "/run/cos/recovery_mode" ] && [ -f "/sbin/openrc" ]: exit status 1)' stage name: Restore /etc/machine-id for openrc systems
2024-05-03T00:35:19Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -qv "interactive-install" /proc/cmdline || grep -qv "install-mode-interactive" /proc/cmdline) && \
[ -f /run/cos/live_mode ] && \
[ -f "/sbin/openrc" ]
: exit status 1)' stage name: Autologin on livecd for OpenRC
2024-05-03T00:35:19Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -q "kairos.reset" /proc/cmdline || [ -f /run/cos/autoreset_mode ]) && \
( [ -e "/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] || [ -e "/usr/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] )
: exit status 1)' stage name: Starts kairos-reset for systemd based systems
2024-05-03T00:35:19Z INF Processing stage step 'Default systemd config'. ( commands: 1, files: 0, ... )
2024-05-03T00:35:19Z INF Command output: Created symlink /etc/systemd/system/default.target → /usr/lib/systemd/system/multi-user.target.

2024-05-03T00:35:19Z ERR 5 errors occurred:
        * failed to run systemctl enable systemd-timesyncd: exit status 1
        * failed to run systemctl enable nohang: exit status 1
        * failed to run systemctl enable nohang-desktop: exit status 1
        * failed to run systemctl enable fail2ban: exit status 1
        * failed to run systemctl enable logrotate.timer: exit status 1

2024-05-03T00:35:19Z INF Processing stage step 'Generate host keys'. ( commands: 1, files: 0, ... )
2024-05-03T00:35:19Z INF Processing stage step 'Link /etc/resolv.conf to systemd resolv.conf'. ( commands: 2, files: 0, ... )
2024-05-03T00:35:19Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.reset" /proc/cmdline && [ -f "/sbin/openrc" ]: exit status 1)' stage name: Starts kairos-reset for openRC-based systems
2024-05-03T00:35:19Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run cat /proc/cmdline | grep "selinux=1"
: exit status 1)' stage name: Relabelling
2024-05-03T00:35:19Z INF Command output:
2024-05-03T00:35:19Z INF Command output:
2024-05-03T00:35:20Z INF Command output: ssh-keygen: generating new host keys: RSA DSA ECDSA ED25519

2024-05-03T00:35:20Z INF Processing stage step 'Create systemd services'. ( commands: 0, files: 5, ... )
2024-05-03T00:35:20Z INF Processing stage step ''. ( commands: 5, files: 0, ... )
2024-05-03T00:35:20Z INF Command output: Removed "/etc/systemd/system/getty.target.wants/getty@tty1.service".

2024-05-03T00:35:20Z INF Command output: Running in chroot, ignoring command 'stop'

2024-05-03T00:35:20Z INF Command output: Created symlink /etc/systemd/system/getty@tty1.service → /dev/null.

2024-05-03T00:35:20Z INF Command output: Created symlink /etc/systemd/system/multi-user.target.wants/kairos.service → /etc/systemd/system/kairos.service.

2024-05-03T00:35:20Z INF Command output: Created symlink /etc/systemd/system/multi-user.target.wants/kairos-webui.service → /etc/systemd/system/kairos-webui.service.

2024-05-03T00:35:20Z INF Processing stage step 'Enable systemd services'. ( commands: 4, files: 0, ... )
2024-05-03T00:35:20Z INF Command output:
2024-05-03T00:35:20Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -q "install-mode" /proc/cmdline || grep -q "nodepair.enable" /proc/cmdline ) && \
([ -f /run/cos/live_mode ] || [ -f /run/cos/uki_install_mode ]) && \
[ -f "/sbin/openrc" ]
: exit status 1)' stage name:
2024-05-03T00:35:20Z INF Command output:
2024-05-03T00:35:20Z INF Command output:
2024-05-03T00:35:20Z INF Command output:
2024-05-03T00:35:20Z INF Processing stage step 'Setup groups'. ( commands: 0, files: 0, ... )
2024-05-03T00:35:20Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -q "interactive-install" /proc/cmdline || grep -q "install-mode-interactive" /proc/cmdline) && \
([ -f /run/cos/live_mode ] || [ -f /run/cos/uki_install_mode ]) && \
( [ -e "/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] || [ -e "/usr/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] )
: exit status 1)' stage name:
2024-05-03T00:35:20Z INF Processing stage step 'Setup users'. ( commands: 0, files: 0, ... )
2024-05-03T00:35:20Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -q "interactive-install" /proc/cmdline || grep -q "install-mode-interactive" /proc/cmdline) && \
([ -f /run/cos/live_mode ] || [ -f /run/cos/uki_install_mode ]) && \
[ -f "/sbin/openrc" ]
: exit status 1)' stage name:
2024-05-03T00:35:20Z INF Processing stage step 'Set user password if running in live or uki'. ( commands: 0, files: 0, ... )
2024-05-03T00:35:20Z INF Processing stage step 'Setup sudo'. ( commands: 1, files: 1, ... )
2024-05-03T00:35:20Z INF Command output: Locking password for user root.
passwd: Success

2024-05-03T00:35:20Z INF Processing stage step 'Ensure runtime permission'. ( commands: 2, files: 0, ... )
2024-05-03T00:35:20Z INF Command output:
2024-05-03T00:35:20Z INF Command output:
2024-05-03T00:35:20Z INF Processing stage step ''. ( commands: 0, files: 0, ... )
2024-05-03T00:35:20Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e "/usr/local/cloud-config" ]: exit status 1)' stage name: Ensure runtime permission
2024-05-03T00:35:20Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/sys/firmware/devicetree/base/model" ] && grep -i jetson "/sys/firmware/devicetree/base/model"
: exit status 1)' stage name: Create files
2024-05-03T00:35:20Z INF Processing stage step ''. ( commands: 0, files: 0, ... )
2024-05-03T00:35:20Z INF Processing stage step 'Set hostname'. ( commands: 0, files: 0, ... )
2024-05-03T00:35:20Z INF Processing stage step 'Run commands'. ( commands: 1, files: 0, ... )
2024-05-03T00:35:20Z INF Command output: 2024-05-03 00:35:20 Add DHCP ClientIdentifier=mac to network config if not already present.
2024-05-03 00:35:20   Adding line [DHCP] to file /etc/systemd/network/20-dhcp.network
2024-05-03 00:35:20   Adding line ClientIdentifier=mac to file /etc/systemd/network/20-dhcp.network
2024-05-03 00:35:20   Adding line [DHCP] to file /etc/systemd/network/20-dhcp-legacy.network
2024-05-03 00:35:20   Adding line ClientIdentifier=mac to file /etc/systemd/network/20-dhcp-legacy.network
2024-05-03 00:35:20 Add ll to the root and Kairos .bashrc if not already present.
2024-05-03 00:35:20   Adding line alias ll="ls -alh" to file /root/.bashrc
2024-05-03 00:35:20   Creating new file /home/kairos/.bashrc with line alias ll="ls -alh"
2024-05-03 00:35:20   Creating new file /home/kairos/.profile with line alias ll="ls -alh"
2024-05-03 00:35:20 Add rke2 bin to the path.
2024-05-03 00:35:20   Adding line export PATH="${PATH}:/var/lib/rancher/rke2/bin/" to file /root/.bashrc
2024-05-03 00:35:20   Adding line export PATH="${PATH}:/var/lib/rancher/rke2/bin/" to file /home/kairos/.bashrc
2024-05-03 00:35:20   Adding line export PATH="${PATH}:/var/lib/rancher/rke2/bin/" to file /home/kairos/.profile

2024-05-03T00:35:20Z INF Done executing stage 'initramfs'

2024-05-03T00:35:20Z INF Running stage: initramfs.after

2024-05-03T00:35:20Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e /sbin/openrc ]: exit status 1)' stage name: Enable serial login for alpine
2024-05-03T00:35:20Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [[ $(kairos-agent state get kairos.flavor) =~ ^ubuntu ]]: exit status 1)' stage name: setupcon initramfs.after ubuntu
2024-05-03T00:35:20Z INF Done executing stage 'initramfs.after'

2024-05-03T00:35:20Z INF Running stage: initramfs.before

2024-05-03T00:35:20Z INF Done executing stage 'initramfs.before'

2024-05-03T00:35:20Z INF Running stage: initramfs

2024-05-03T00:35:20Z INF Done executing stage 'initramfs'

2024-05-03T00:35:20Z INF Running stage: initramfs.after

2024-05-03T00:35:20Z INF Done executing stage 'initramfs.after'
ci-robbot commented 6 months ago

Error in script: /init: 22: /init: /sbin/touch: not found

sarg3nt commented 6 months ago

Tried building an ISO of our current customized OS and installing it manually on a vsphere node with the firmware set to efi and I get the following error: image

I tried BIOS and it worked fine. Am I missing something?

Our custom container image starts with /kairos/rockylinux:9-core-amd64-generic-master and installs a few packages, rke2 and some security software. Nothing that would dork over the kernel that I can think of. Especially not in a way that would break it for EFI but not BIOS.

Here's the build command. It's right out of the docs.

#!/bin/bash
docker run -v "$PWD"/build:/tmp/auroraboot \
  -v /var/run/docker.sock:/var/run/docker.sock \
  --rm -ti quay.artifactory.metro.ad.selinc.com/kairos/auroraboot:v0.2.7 \
  --set container_image=docker://<snip> \
  --set "disable_http_server=true" \
  --set "disable_netboot=true" \
  --set "state_dir=/tmp/auroraboot"
ci-robbot commented 5 months ago

It seems there was an issue with the installation process. The /var/log/installer/syslog and /run/immucore/rootfs_stage.log logs may provide information about the errors encountered during the installation.

In the /var/log/installer/syslog log, you can see several errors related to connecting to the system bus and reloading the network configuration. It's possible that the installation environment is not fully set up, or there is a problem with the networking configuration.

In the /run/immucore/rootfs_stage.log, you can see errors related to creating directories and connecting to the system bus. It's also mentioned that the installation process is skipping certain stages due to various if statements not matching the current conditions.

To troubleshoot the issue, you can try the following steps:

  1. Ensure that the necessary services and dependencies are properly installed and configured.
  2. Verify the networking configuration and make sure it's set up correctly.
  3. Check the hardware compatibility of the system with the software being installed.
  4. Consider re-running the installation process, possibly using a different installation method or media.

If the issue persists, it may be helpful to seek further assistance from the software's official support channels or community forums, providing the relevant logs and details about your system configuration.