Kairos not installing on correct device

sarg3nt commented 7 months ago

Kairos version:

/kairos/rockylinux:9-core-amd64-generic-v2.4.3

NAME="Rocky Linux"
VERSION="9.3 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.3"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.3 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.3"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.3"
KAIROS_NAME="kairos-core-rockylinux-9"
KAIROS_VERSION="v2.4.3"
KAIROS_ID="kairos"
KAIROS_ID_LIKE="kairos-core-rockylinux-9"
KAIROS_VERSION_ID="v2.4.3"
KAIROS_PRETTY_NAME="kairos-core-rockylinux-9 v2.4.3"
KAIROS_BUG_REPORT_URL="https://github.com/kairos-io/kairos/issues"
KAIROS_HOME_URL="https://github.com/kairos-io/kairos"
KAIROS_IMAGE_REPO="quay.io/kairos/rockylinux"
KAIROS_IMAGE_LABEL="9-core-amd64-generic-v2.4.3"
KAIROS_GITHUB_REPO="kairos-io/kairos"
KAIROS_VARIANT="core"
KAIROS_FLAVOR="rockylinux"
KAIROS_ARTIFACT="kairos-rockylinux-9-core-amd64-generic-v2.4.3"

CPU architecture, OS, and Version:

Linux lpul-vault-k8s-agent-2.vault.ad.selinc.com 5.14.0-362.8.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov 8 17:36:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug

Hello Kairos team. I'm running into an old issue again. I thought we got this solved by adding volume labels to my other disks but looks like not.

I have three disks in my VM, sda, sdb, sdc

The cloud_init.yaml is

NOTE: this is a Terraform template, thus the TF code

strict: true
# enable debug logging
debug: true
install:
  # Why doesn't this work constantly?
  device: "/dev/sda" 
  auto: true
  poweroff: false
  reboot: true
  # Turn on debug logging in Kairos
  # View cloud_init logs in /run/immucore/
  # grub_options:
  #   extra_cmdline: "rd.immucore.debug"
users:
  # The kairos user is configured in the target nodes terraform
  - name: "kairos-auroraboot"
    passwd: "${password}"
    %{ if length(ssh_keys) > 0 } 
    ssh_authorized_keys:
      %{ for key in ssh_keys }
      - ${key}
      %{ endfor }
    %{ endif }
# Boot stages in Kairos: https://kairos.io/docs/architecture/cloud-init/#boot-stages
write_files:
  # Set the qualys_https_proxy to wall
  - encoding: b64
    content: <redacted>
    path: /etc/sysconfig/qualys-cloud-agent
    permissions: "0444"
stages:
  boot:
    - systemd_firstboot:
      keymap: us
    - name: "Environment Variables"
      environment:
        HTTP_PROXY: "http://wall.ad.selinc.com:8080"
        HTTPS_PROXY: "http://wall.ad.selinc.com:8080"
        http_proxy: "http://wall.ad.selinc.com:8080"
        https_proxy: "http://wall.ad.selinc.com:8080"
        NO_PROXY: "localhost,localaddress,svc.cluster.local,host.docker.internal,kubernetes.docker.internal,.svc.cluster.local,cluster.local,.cluster.local,default.svc,docker.sel.inc,sel.inc,.sel.inc,ad.selinc.com,.ad.selinc.com,metro.ad.selinc.com,.metro.ad.selinc.com,bitbucket.metro.ad.selinc.com,artifactory.metro.ad.selinc.com,*.ad.selinc.com,10.43.0.1,127.0.0.1,127.0.0.0,0.0.0.0,127.0.0.0/8,10.0.0.0/8,10.*.*.*,10.*,172.16.0.0/12,192.168.0.0/16,169.254.169.254"
        no_proxy: "localhost,localaddress,svc.cluster.local,host.docker.internal,kubernetes.docker.internal,.svc.cluster.local,cluster.local,.cluster.local,default.svc,docker.sel.inc,sel.inc,.sel.inc,ad.selinc.com,.ad.selinc.com,metro.ad.selinc.com,.metro.ad.selinc.com,bitbucket.metro.ad.selinc.com,artifactory.metro.ad.selinc.com,*.ad.selinc.com,10.43.0.1,127.0.0.1,127.0.0.0,0.0.0.0,127.0.0.0/8,10.0.0.0/8,10.*.*.*,10.*,172.16.0.0/12,192.168.0.0/16,169.254.169.254"
    - name: "Setup services"
      systemctl:
        disable:
          - dnf-makecache
    - name: "Setup NTP"
      systemctl:
        enable:
          - systemd-timesyncd
      timesyncd:
        NTP: "ntp.ad.selinc.com ntp2.ad.selinc.com ntp3.ad.selinc.com"
        FallbackNTP: ""
  after-install-chroot:
    # Creates the data dir after install inside the final system chroot
    - name: "Create data dir"
      commands:
        # Pass the directory to mount the disk under as the second paramater to the script.
        - mount_disk.sh "make_directory" "sdb" "/var/lib/rancher/rke2" 2>&1 | tee -a /var/log/sel/mount_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
        - mount_disk.sh "make_directory" "sdc" "/var/lib/rancher/longhorn" 2>&1 | tee -a /var/log/sel/mount_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
    # Formats the disk ONLY after-install and just once.
    - name: "Format /dev/sdb"
      commands:
        - mount_disk.sh "format_disk" "sdb" "/var/lib/rancher/rke2" 2>&1 | tee -a /var/log/sel/mount_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
        - mount_disk.sh "format_disk" "sdc" "/var/lib/rancher/longhorn" 2>&1 | tee -a /var/log/sel/mount_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
  # Creates the data dir after reset inside the final system chroot, just in case it's not there
  after-reset-chroot:
    - name: "Create data dir"
      commands:
        - mount_disk.sh "make_directory" "sdb" "/var/lib/rancher/rke2" 2>&1 | tee -a /var/log/sel/mount_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
        - mount_disk.sh "make_directory" "sdc" "/var/lib/rancher/longhorn" 2>&1 | tee -a /var/log/sel/mount_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
  # Creates the data dir after upgrade inside the final system chroot, just in case it's not there
  after-upgrade-chroot:
    - name: "Create data dir"
      commands:
        - mount_disk.sh "make_directory" "sdb" "/var/lib/rancher/rke2" 2>&1 | tee -a /var/log/sel/mount_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
        - mount_disk.sh "make_directory" "sdc" "/var/lib/rancher/longhorn" 2>&1 | tee -a /var/log/sel/mount_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
  initramfs:
    # Mounts the disk during initramfs on each boot, with RW. Extra options can be added to the mount here
    - name: "Mount /dev/sdb"
      commands:
        - mount_disk.sh "mount_disk" "sdb" "/var/lib/rancher/rke2" 2>&1 | tee -a /var/log/sel/mount_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
        - mount_disk.sh "mount_disk" "sdc" "/var/lib/rancher/longhorn" 2>&1 | tee -a /var/log/sel/mount_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi

The mount_disk.sh file is:

#!/bin/bash
# Copyright (c) 2024 Schweitzer Engineering Laboratories, Inc.
# SEL Confidential
set -euo pipefail
IFS=$'\n\t'

# cSpell:ignore

# This script is ran in the cloud_init.yaml and not in the Dockerfile so it must remain in the target container image.

make_directory() {
  local directory="${1-}"
  if [[ -d "$directory" ]]; then
    log "  The $directory directory already exists, skipping creation" "cyan"
  else
    log "  Creating the $directory directory" "green"
    mkdir -p "$directory"
  fi
}

# Format /dev/$disk if not already formatted.
format_disk() {
  local disk="${1-}"
  log "  Checking format status of /dev/$disk" "green"
  log "  Format disk troubleshooting output of \"lsblk -f /dev/$disk\"" "cyan"
  lsblk -f "/dev/$disk"

  if [[ ! $(lsblk -f "/dev/$disk") = *"ext4"* ]]; then
    log "  Formatting /dev/$disk" "green"
    mkfs.ext4 -L "SEL_$disk" "/dev/$disk"

    log "  Format disk troubleshooting output of \"lsblk -f /dev/$disk\" after format" "cyan"
    lsblk -f "/dev/$disk"
  else
    log "  /dev/$disk is already formatted" "cyan"
  fi

}

# Mount /dev/$disk to /rke and create the sub directorires.
mount_disk() {
  local disk="${1-}"
  local directory="${2-}"
  local owner="${3-}"
  local extra_directories="${4-}"
  log "  Mount disk troubleshooting output of \"lsblk -f /dev/$disk\" before mount" "cyan"
  lsblk -f "/dev/$disk"

  log "  Mounting /dev/$disk to $directory" "green"
  mount -o rw "/dev/$disk" "$directory"

  log "  Mount disk troubleshooting output of \"lsblk -f /dev/$disk\" after mount" "cyan"
  lsblk -f "/dev/$disk"

  if [[ -n "$extra_directories" ]]; then
    IFS=","
    for new_directory in $extra_directories; do
      if [[ -d "$new_directory" ]]; then
        log "  The $new_directory directory already exists, skipping creation" "cyan"
      else
        log "  Creating the $new_directory directory" "green"
        mkdir -p "$new_directory"
      fi
    done
    IFS=$'\n\t'
  fi

  if [[ -n "$owner" ]]; then
    log "  Setting ${owner} as the owner of ${directory} recursively" "green"
    chown -R "${owner}:${owner}" "${directory}"
  fi
}

main() {
  source "/usr/bin/lib/sh/log.sh"
  local option="${1-}"
  local disk="${2-}"
  local directory="${3-}"
  local owner="${4-}"
  local extra_directories="${5-}"

  log "Running mount_disk.sh with option $option for disk $disk in directory $directory" "blue"
  case "$option" in
    "make_directory")
      make_directory "$directory"
      ;;
    "format_disk")
      format_disk "$disk"
      ;;
    "mount_disk")
      mount_disk "$disk" "$directory" "$owner" "$extra_directories"
      ;;
  esac
}

# Run main
if ! (return 0 2> /dev/null); then
  (main "$@")
fi

The log output of mount_disk.sh on the working nodes is

2024-02-12 06:38:52 Running mount_disk.sh with option make_directory for disk sdb in directory /var/lib/rancher/rke2
2024-02-12 06:38:52   The /var/lib/rancher/rke2 directory already exists, skipping creation
2024-02-12 06:38:52 Running mount_disk.sh with option make_directory for disk sdc in directory /var/lib/rancher/longhorn
2024-02-12 06:38:52   Creating the /var/lib/rancher/longhorn directory
2024-02-12 06:38:52 Running mount_disk.sh with option format_disk for disk sdb in directory /var/lib/rancher/rke2
2024-02-12 06:38:52   Checking format status of /dev/sdb
2024-02-12 06:38:52   Format disk troubleshooting output of "lsblk -f /dev/sdb"
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sdb
2024-02-12 06:38:52   Formatting /dev/sdb
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done
Creating filesystem with 10485760 4k blocks and 2621440 inodes
Filesystem UUID: 01dccb6a-c5f8-4c96-a6aa-867af7bae5fe
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624

Allocating group tables: done
Writing inode tables: done
Creating journal (65536 blocks): done
Writing superblocks and filesystem accounting information: done

2024-02-12 06:38:52   Format disk troubleshooting output of "lsblk -f /dev/sdb" after format
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sdb
2024-02-12 06:38:52 Running mount_disk.sh with option format_disk for disk sdc in directory /var/lib/rancher/longhorn
2024-02-12 06:38:52   Checking format status of /dev/sdc
2024-02-12 06:38:52   Format disk troubleshooting output of "lsblk -f /dev/sdc"
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sdc
2024-02-12 06:38:52   Formatting /dev/sdc
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done
Creating filesystem with 20971520 4k blocks and 5242880 inodes
Filesystem UUID: acb01df5-75c2-404b-9552-314c1e999bd5
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000

Allocating group tables: done
Writing inode tables: done
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done

2024-02-12 06:38:52   Format disk troubleshooting output of "lsblk -f /dev/sdc" after format
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sdc
2024-02-12 06:39:36 Running mount_disk.sh with option mount_disk for disk sdb in directory /var/lib/rancher/rke2
2024-02-12 06:39:36   Mount disk troubleshooting output of "lsblk -f /dev/sdb" before mount
NAME FSTYPE FSVER LABEL   UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sdb  ext4   1.0   SEL_sdb 01dccb6a-c5f8-4c96-a6aa-867af7bae5fe
2024-02-12 06:39:36   Mounting /dev/sdb to /var/lib/rancher/rke2
2024-02-12 06:39:36   Mount disk troubleshooting output of "lsblk -f /dev/sdb" after mount
NAME FSTYPE FSVER LABEL   UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sdb  ext4   1.0   SEL_sdb 01dccb6a-c5f8-4c96-a6aa-867af7bae5fe   37.1G     0% /usr/local/.state/var-lib-rancher.bind/rke2
                                                                              /var/lib/rancher/rke2
2024-02-12 06:39:36 Running mount_disk.sh with option mount_disk for disk sdc in directory /var/lib/rancher/longhorn
2024-02-12 06:39:36   Mount disk troubleshooting output of "lsblk -f /dev/sdc" before mount
NAME FSTYPE FSVER LABEL   UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sdc  ext4   1.0   SEL_sdc acb01df5-75c2-404b-9552-314c1e999bd5
2024-02-12 06:39:36   Mounting /dev/sdc to /var/lib/rancher/longhorn
2024-02-12 06:39:36   Mount disk troubleshooting output of "lsblk -f /dev/sdc" after mount
NAME FSTYPE FSVER LABEL   UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sdc  ext4   1.0   SEL_sdc acb01df5-75c2-404b-9552-314c1e999bd5   74.2G     0% /usr/local/.state/var-lib-rancher.bind/longhorn
                                                                              /var/lib/rancher/longhorn
with lsblk 
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop0    7:0    0  1.7G  1 loop /
sda      8:0    0   80G  0 disk
├─sda1   8:1    0    1M  0 part
├─sda2   8:2    0   64M  0 part /oem
├─sda3   8:3    0  3.5G  0 part
├─sda4   8:4    0  5.9G  0 part /run/initramfs/cos-state
└─sda5   8:5    0 70.5G  0 part /etc/pki/tls/certs
                                /var/lib/wicked
                                /var/lib/snapd
                                /var/lib/rancher
                                /var/lib/longhorn
                                /var/lib/kubelet
                                /var/lib/extensions
                                /var/lib/dbus
                                /var/lib/containerd
                                /var/lib/cni
                                /var/lib/ca-certificates
                                /etc/zfs
                                /etc/systemd
                                /etc/sysconfig
                                /etc/ssh
                                /var/snap
                                /etc/runlevels
                                /etc/rancher
                                /etc/modprobe.d
                                /var/log
                                /usr/libexec
                                /etc/kubernetes
                                /etc/iscsi
                                /etc/cni
                                /root
                                /opt
                                /home
                                /usr/local
sdb      8:16   0   40G  0 disk /usr/local/.state/var-lib-rancher.bind/rke2
                                /var/lib/rancher/rke2
sdc      8:32   0   80G  0 disk /usr/local/.state/var-lib-rancher.bind/longhorn
                                /var/lib/rancher/longhorn

The log output of mount_disk.sh on the broken node is

2024-02-12 06:38:55 Running mount_disk.sh with option make_directory for disk sdb in directory /var/lib/rancher/rke2
2024-02-12 06:38:55   The /var/lib/rancher/rke2 directory already exists, skipping creation
2024-02-12 06:38:55 Running mount_disk.sh with option make_directory for disk sdc in directory /var/lib/rancher/longhorn
2024-02-12 06:38:55   Creating the /var/lib/rancher/longhorn directory
2024-02-12 06:38:55 Running mount_disk.sh with option format_disk for disk sdb in directory /var/lib/rancher/rke2
2024-02-12 06:38:55   Checking format status of /dev/sdb
2024-02-12 06:38:55   Format disk troubleshooting output of "lsblk -f /dev/sdb"
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sdb
2024-02-12 06:38:55   Formatting /dev/sdb
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done
Creating filesystem with 10485760 4k blocks and 2621440 inodes
Filesystem UUID: e20995ea-c144-4353-8ce8-0c29ee6c442c
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624

Allocating group tables: done
Writing inode tables: done
Creating journal (65536 blocks): done
Writing superblocks and filesystem accounting information: done

2024-02-12 06:38:55   Format disk troubleshooting output of "lsblk -f /dev/sdb" after format
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sdb
2024-02-12 06:38:55 Running mount_disk.sh with option format_disk for disk sdc in directory /var/lib/rancher/longhorn
2024-02-12 06:38:55   Checking format status of /dev/sdc
2024-02-12 06:38:55   Format disk troubleshooting output of "lsblk -f /dev/sdc"
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sdc
2024-02-12 06:38:55   Formatting /dev/sdc
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done
Creating filesystem with 20971520 4k blocks and 5242880 inodes
Filesystem UUID: 7a6abc47-98fb-4605-a05a-8bd2c94c42fc
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000

Allocating group tables: done
Writing inode tables: done
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done

2024-02-12 06:38:56   Format disk troubleshooting output of "lsblk -f /dev/sdc" after format
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sdc
2024-02-12 06:39:41 Running mount_disk.sh with option mount_disk for disk sdb in directory /var/lib/rancher/rke2
2024-02-12 06:39:41   Mount disk troubleshooting output of "lsblk -f /dev/sdb" before mount
NAME   FSTYPE FSVER LABEL          UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sdb
├─sdb1
├─sdb2 ext4   1.0   COS_OEM        6e7b6008-65fa-4358-9d68-0a528bae0492   50.2M     0% /oem
├─sdb3 ext4   1.0   COS_RECOVERY   aedf054e-bef6-4ee4-8e06-75b9a98da84a
├─sdb4 ext4   1.0   COS_STATE      b2c98f4a-3ba3-4afb-8947-0bec22eb9b27
└─sdb5 ext4   1.0   COS_PERSISTENT 1de0d1f2-2c33-4c5f-85e5-53ef9442531e   65.1G     0% /var/lib/wicked
                            /var/lib/snapd
                            /var/lib/rancher
                            /var/lib/longhorn
                            /var/lib/kubelet
                            /var/lib/extensions
                            /var/lib/dbus
                            /var/lib/containerd
                            /var/lib/cni
                            /var/lib/ca-certificates
                            /etc/zfs
                            /etc/systemd
                            /etc/sysconfig
                            /etc/ssh
                            /var/snap
                            /etc/runlevels
                            /etc/rancher
                            /etc/modprobe.d
                            /var/log
                            /usr/libexec
                            /etc/kubernetes
                            /etc/iscsi
                            /etc/cni
                            /root
                            /opt
                            /home
                            /usr/local
2024-02-12 06:39:41   Mounting /dev/sdb to /var/lib/rancher/rke2
mount: /var/lib/rancher/rke2: /dev/sdb already mounted or mount point busy.
2024-02-12 06:39:41 Running mount_disk.sh with option mount_disk for disk sdc in directory /var/lib/rancher/longhorn
2024-02-12 06:39:41   Mount disk troubleshooting output of "lsblk -f /dev/sdc" before mount
NAME FSTYPE FSVER LABEL   UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sdc  ext4   1.0   SEL_sdc 7a6abc47-98fb-4605-a05a-8bd2c94c42fc
2024-02-12 06:39:41   Mounting /dev/sdc to /var/lib/rancher/longhorn
2024-02-12 06:39:41   Mount disk troubleshooting output of "lsblk -f /dev/sdc" after mount
NAME FSTYPE FSVER LABEL   UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sdc  ext4   1.0   SEL_sdc 7a6abc47-98fb-4605-a05a-8bd2c94c42fc   74.2G     0% /usr/local/.state/var-lib-rancher.bind/longhorn
                                                                              /var/lib/rancher/longhorn

This feels like a race condition.

What is the point of

install:
  device: "/dev/sda"

If it's going to ignore it?

NOTE: All nodes are create with Terraform and should be exactly the same.

Any help?

To Reproduce See above config

Expected behavior All nodes should use the volume specified int he 'cloud_init.yaml' file.

sarg3nt commented 7 months ago

I should add that these machines are being created from AuroraBoot, not sure that matters.

jimmykarily commented 7 months ago

can you do an lsblk on /dev/sda on the second (failed) node? Also since you have debug: true in the config, can you also attach the installation logs of both machines?

sarg3nt commented 7 months ago

Hi @jimmykarily I think I might have figured this out somewhat. After building a few more clusters I discovered some more oddities. My labels SEL_disk1 and SEL_disk2 where sometimes getting applied to the opposite devices they were supposed to. Turns out that wasn't really the problem. The problem is that /dev/sda, /dev/sdb, etc. are no longer guaranteed to be assigned to the first, second, etc. devices in the SCSI chain. They were never actually guaranteed but where fairly consistent until recent kernels switched to asynchronous device scanning. Now device names can be assigned to random hardware ID's, so /dev/sda is not always going to be the first "disk" in the system. See these posts for more: https://access.redhat.com/solutions/3962551 which references https://www.spinics.net/lists/linux-scsi/msg166873.html You'll have to click through the replies to get the full picture.

Maybe you already knew all of this, but it's news to me. :)

So, I changed my mount_disk.sh code to this.

#!/bin/bash
# Copyright (c) 2024 Schweitzer Engineering Laboratories, Inc.
# SEL Confidential
set -euo pipefail
IFS=$'\n\t'

# cSpell:ignore

# This script is ran in the cloud_init.yaml and not in the Dockerfile so it must remain in the target container image.

make_directory() {
  local directory="${1-}"
  if [[ -d "$directory" ]]; then
    log "  The $directory directory already exists, skipping creation" "cyan"
  else
    log "  Creating the $directory directory" "green"
    mkdir -p "$directory"
  fi
}

# Format /dev/$disk if not already formatted.
format_disk() {
  local disk="${1-}"
  log "  Status before mount" "cyan"
  ls -l "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"
  lsblk -f "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"

  if [[ ! $(lsblk -f "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0") = *"ext4"* ]]; then
    log "  Formatting disk ${disk}" "green"
    mkfs.ext4 -L "SEL_${disk}" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"

  log "  Status after mount" "cyan"
  ls -l "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"
  lsblk -f "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"
  else
    log "  Disk $disk is already formatted" "cyan"
  fi
 }

# Mount /dev/$disk to /rke and create the sub directorires.
mount_disk() {
  local disk="${1-}"
  local directory="${2-}"
  local owner="${3-}"
  local extra_directories="${4-}"

  log "  Status before mount" "cyan"
  ls -l "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"
  lsblk -f "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"

  log "  Mounting disk $disk to $directory" "green"
  mount -o rw --source "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0" "$directory"

  log "  Status after mount" "cyan"
  ls -l "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"
  lsblk -f "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"

  if [[ -n "$extra_directories" ]]; then
    IFS=","
    for new_directory in $extra_directories; do
      if [[ -d "$new_directory" ]]; then
        log "  The $new_directory directory already exists, skipping creation" "cyan"
      else
        log "  Creating the $new_directory directory" "green"
        mkdir -p "$new_directory"
      fi
    done
    IFS=$'\n\t'
  fi

  if [[ -n "$owner" ]]; then
    log "  Setting ${owner} as the owner of ${directory} recursively" "green"
    chown -R "${owner}:${owner}" "${directory}"
  fi
}

main() {
  source "/usr/bin/lib/sh/log.sh"
  local option="${1-}"
  local disk="${2-}"
  local directory="${3-}"
  local owner="${4-}"
  local extra_directories="${5-}"

  log "Running mount_disk.sh with option $option for disk $disk in directory $directory" "blue"
  case "$option" in
    "make_directory")
      make_directory "$directory"
      ;;
    "format_disk")
      format_disk "$disk"
      ;;
    "mount_disk")
      mount_disk "$disk" "$directory" "$owner" "$extra_directories"
      ;;
  esac
}

# Run main
if ! (return 0 2> /dev/null); then
  (main "$@")
fi

As you can see I'm now referencing the /dev/disk/by-path/pci-0000:03:00.0-scsi..... hardware ID, which seems to be stable for our VSphere setup.

So now I should be able to control disks 1 and 2 with confidence, but I don't know if that will 100% solve the Kairos not using disk 0 issue. I think you guys said if the devices have labels it should work fine, but I'm not sure how this is supposed to work as the format process happens later in the install cycle? Or am I wrong there? i.e. I'm formatting the disk and thus assigning the label in the after-install-chroot stage, is that before Karios grabs a disk to install? If not, how do we make sure it gets the first first physical disk.

For reference the disk paths are

/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0 # First physical disk, Karios should use this.
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0 # Second physical disk, I mount this to /var/lib/rancher/rke2
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0 # Third physical disk, I mount this to /var/lib/rancher/longhorn

Ohhh, I just realized that

install:
  device: "/dev/sda"

Should accept

install:
  device: "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"

Yes/No?

I ran out of time tonight so I'll work on testing this tomorrow and let you know what I find.

sarg3nt commented 7 months ago

To answer the question as to weather AuroraBoot will allow

install:
  device: "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"

The answer is no. It fails validation.

Kairos Version: 9-core-amd64-generic-v2.4.3
2024-02-14 17:51:45   Target OSs /etc/systemd/system/cloud_init.yaml does not pass validation. Quitting.
2024-02-14 17:51:45   jsonschema: '/install/device' does not validate with file:///schema.json#/properties/install/$ref/properties/device/pattern: does not match pattern '^(auto|/|(/[a-zA-Z0-9_-]+)+)$'

I set strict: false and it still runs validation and won't run.

Is this fixable?

sarg3nt commented 7 months ago

I've built two more test clusters and so far all the right physical disks ended up attached to the right directories. Kairos on disk 0, RKE2 on Disk 1 and Lonhorn on 2. I'll keep testing and let you know if it failures. sda, sdb, and sdc are moving around so we can't trust those devices any more to be on the correct disks but once you know that I think it's OK

I think it would still be nice to get

install:
  device: "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"

working too though.

Telling kairos to install to /dev/sda is virtually useless now.

sarg3nt commented 7 months ago

My recommendation for a "fix" are the following.

Support device: "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" in the cloud_init.yaml file
Update the docs instructing users to not use /dev/sdx and explain how and where to find the path to their physical device and how to use it.

sarg3nt commented 7 months ago

@jimmykarily any thoughts on the above?

jimmykarily commented 7 months ago

@jimmykarily any thoughts on the above?

Choosing disks by label/id/path/etc is not yet supported (it has been discussed before). What you are describing is a valid use case and I think the only workaround for now would be to use no-format (docs):

  # no-format: true skips any disk partitioning and formatting
  # If set to true installation procedure will error out if expected
  # partitions are not already present within the disk.
  no-format: true

and do the partitioning completely manually using some script in a cloud config. @kairos-io/maintainers what would be the right stage to do the partitioning?

sarg3nt commented 7 months ago

@jimmykarily I have an update on this and it is really really weird. I'll try to keep in short.

We have three volumes. boot at 80 gig, rke2 at 40 gig, longhorn at 80gig
I wanted to run some large volume tests on our Longhorn deployment so I set the disk size to be a few hundred gigs.
System net-booted, Kairos installed, then rebooted, then installed again and continued that install loop infinitely.
I tried the docs located here: https://kairos.io/docs/reference/configuration/#layout to control what SCSI device Kairos installed to. It no longer install loops, but now it installs, reboots and immediately stops at a flashing cursor in the top left corner of the console.
Did a LOT of experimentation with a coworker until finally I realized that all other disks must be smaller than or equal to the boot disk that Kairos installs to. i.e if the Kairors install disk is 80 gig than all other disks must be <= 80 gig. If any of them are 81 or greater the above issues occur. Set the Kairos install disk to 150 gig then other disks can be up to 150 gig.

I'm flummoxed. Is this a requirement of some kind or a bug?

sarg3nt commented 7 months ago

@jimmykarily I've stripped down our config as much as possible, including now mounting disks 2 and 3 and as little config in the cloud_init.yaml files as possible and still get the results above. If the second or third disks are bigger than Kairos install disk Kairos will install, reboot, then hand at startup. This is with building the boot disk manually. If I don't do that and let it auto assign and build itself then it gets into an infinite install loop. Since this is a separate issue from this request I'll open a new ticket as a defect.

sarg3nt commented 7 months ago

I created the new issue after doing some more investigation. https://github.com/kairos-io/kairos/issues/2281

kairos-io / kairos

Kairos not installing on correct device #2243