kairos-io / kairos

:penguin: The immutable Linux meta-distribution for edge Kubernetes.
https://kairos.io
Apache License 2.0
975 stars 86 forks source link

`install.auto: false` is not respected when booting with Auroraboot #2516

Open jimmykarily opened 1 month ago

jimmykarily commented 1 month ago

Moved to a new ticket from this one: https://github.com/kairos-io/kairos/issues/2281#issuecomment-2078014965

We need to find out if auto: false is respected in general and when booting with auroraboot. If yes then let's find out what's wrong in the original report.

ci-robbot commented 1 month ago

Hello,

I am a bot, an experiment of @mudler and @jimmykarily. Your issue titled "install.auto: false is not respected when booting with Auroraboot"` (#2516) has been noted. Please provide more information and follow the project's guidelines by adding the missing details.

To better assist you, please ensure that the issue includes:

  1. A clear description of the issue.
  2. Steps to reproduce (if it's a bug).
  3. The versions of the relevant artifacts being used.

Once these requirements are met, the issue can be properly triaged and addressed. Thank you for your understanding and cooperation.

sarg3nt commented 4 weeks ago

This may also solve: https://github.com/kairos-io/kairos/issues/2030

sarg3nt commented 4 weeks ago

An update from my side of things. While troubleshooting another issue I realized that maybe it had to do with the file being supplied to the node directly from vSphere customization not having auto: false set. so I added it but it didn't change anything.

As a reminder. we have a cloud_init.yaml file being served from AuroraBoot and I can't find any evidence that any of the config, regardless of stage, is being ran from that file, however, we also have a "per node" config being added from the vSphere guestinfo.userdata in Terraform, and it looks like everything, regardless of stage is running from there. Specifically adding auto: false to that file as well as the the one from AuroraBoot did not keep that from happening.

Terraform for adding the custom config to a vSphere VM

  extra_config = {
    "guestinfo.userdata"          = data.template_cloudinit_config.agent[count.index].rendered
    "guestinfo.userdata.encoding" = "gzip+base64"
  }

initramfs_stage.log

[root@lpul-vault-k8s-server-0 immucore]# cat initramfs_stage.log
2024-05-02T19:48:59Z INF Running stage: initramfs.before

2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f /oem/userdata ]: exit status 1)' stage name: Pull data from provider
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e /sbin/openrc ]: exit status 1)' stage name: Blacklist bpfilter on Alpine ( bug: https://github.com/kairos-io/kairos/issues/277 )
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run ! [[ -f /etc/hosts ]] || ! [[ $(grep '127.0.0.1' /etc/hosts) ]]
: exit status 1)' stage name: Make sure hosts file is present and includes a record for 127.0.0.1
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f /oem/userdata ]: exit status 1)' stage name:
2024-05-02T19:48:59Z INF Done executing stage 'initramfs.before'

2024-05-02T19:48:59Z INF Running stage: initramfs

2024-05-02T19:48:59Z INF Processing stage step 'Enable systemd-network config files for DHCP'. ( commands: 1, files: 2, ... )
2024-05-02T19:48:59Z INF Processing stage step ''. ( commands: 1, files: 0, ... )
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/sbin/openrc" ]
: exit status 1)' stage name: Create OpenRC services
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.remote_recovery_mode" /proc/cmdline && \
( [ -e "/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] || [ -e "/usr/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] )
: exit status 1)' stage name: Starts kairos-recovery and generate a temporary pass
2024-05-02T19:48:59Z INF Processing stage step 'systemd-sysext initramfs settings'. ( commands: 0, files: 0, ... )
2024-05-02T19:48:59Z INF Processing stage step 'Create journalctl /var/log/journal dir'. ( commands: 0, files: 0, ... )
2024-05-02T19:48:59Z ERR Failed to connect system bus: No such file or directory
: failed to run networkctl reload: exit status 1
2024-05-02T19:48:59Z ERR 1 error occurred:
        * failed to run networkctl reload: exit status 1

2024-05-02T19:48:59Z INF Command output: Created symlink /etc/systemd/system/multi-user.target.wants/kairos-agent.service → /etc/systemd/system/kairos-agent.service.

2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/sbin/openrc" ]
: exit status 1)' stage name: Enable OpenRC services
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f "/run/cos/recovery_mode" ] && [ ! -f "/run/cos/live_mode" ]: exit status 1)' stage name:
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f "/run/cos/recovery_mode" ] && [ -s /usr/local/etc/machine-id ]: exit status 1)' stage name: Restore /etc/machine-id for systemd systems
2024-05-02T19:48:59Z INF Processing stage step 'Disable NetworkManager and wicked'. ( commands: 0, files: 0, ... )
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.remote_recovery_mode" /proc/cmdline && [ -f "/sbin/openrc" ]: exit status 1)' stage name: Starts kairos-recovery for openRC based systems
2024-05-02T19:48:59Z INF Processing stage step ''. ( commands: 0, files: 2, ... )
2024-05-02T19:48:59Z ERR 2 errors occurred:
        * failed to run systemctl disable NetworkManager: exit status 1
        * failed to run systemctl disable wicked: exit status 1

2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f "/run/cos/recovery_mode" ] && [ -f "/sbin/openrc" ]: exit status 1)' stage name: Restore /etc/machine-id for openrc systems
2024-05-02T19:48:59Z INF Processing stage step 'Enable systemd-network and systemd-resolved'. ( commands: 0, files: 0, ... )
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -q "kairos.reset" /proc/cmdline || [ -f /run/cos/autoreset_mode ]) && \
( [ -e "/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] || [ -e "/usr/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] )
: exit status 1)' stage name: Starts kairos-reset for systemd based systems
2024-05-02T19:48:59Z INF Processing stage step 'Default systemd config'. ( commands: 1, files: 0, ... )
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -qv "interactive-install" /proc/cmdline || grep -qv "install-mode-interactive" /proc/cmdline) && \
[ -f /run/cos/live_mode ] && \
[ -f "/sbin/openrc" ]
: exit status 1)' stage name: Autologin on livecd for OpenRC
2024-05-02T19:48:59Z INF Command output: Created symlink /etc/systemd/system/default.target → /usr/lib/systemd/system/multi-user.target.

2024-05-02T19:48:59Z ERR 5 errors occurred:
        * failed to run systemctl enable systemd-timesyncd: exit status 1
        * failed to run systemctl enable nohang: exit status 1
        * failed to run systemctl enable nohang-desktop: exit status 1
        * failed to run systemctl enable fail2ban: exit status 1
        * failed to run systemctl enable logrotate.timer: exit status 1

2024-05-02T19:48:59Z INF Processing stage step 'Generate host keys'. ( commands: 1, files: 0, ... )
2024-05-02T19:48:59Z INF Processing stage step 'Link /etc/resolv.conf to systemd resolv.conf'. ( commands: 2, files: 0, ... )
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.reset" /proc/cmdline && [ -f "/sbin/openrc" ]: exit status 1)' stage name: Starts kairos-reset for openRC-based systems
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run cat /proc/cmdline | grep "selinux=1"
: exit status 1)' stage name: Relabelling
2024-05-02T19:48:59Z INF Command output:
2024-05-02T19:48:59Z INF Command output:
2024-05-02T19:49:00Z INF Command output: ssh-keygen: generating new host keys: RSA DSA ECDSA ED25519

2024-05-02T19:49:00Z INF Processing stage step 'Create systemd services'. ( commands: 0, files: 5, ... )
2024-05-02T19:49:00Z INF Processing stage step ''. ( commands: 5, files: 0, ... )
2024-05-02T19:49:00Z INF Command output: Removed "/etc/systemd/system/getty.target.wants/getty@tty1.service".

2024-05-02T19:49:00Z INF Command output: Running in chroot, ignoring command 'stop'

2024-05-02T19:49:00Z INF Command output: Created symlink /etc/systemd/system/getty@tty1.service → /dev/null.

2024-05-02T19:49:00Z INF Command output: Created symlink /etc/systemd/system/multi-user.target.wants/kairos.service → /etc/systemd/system/kairos.service.

2024-05-02T19:49:00Z INF Command output: Created symlink /etc/systemd/system/multi-user.target.wants/kairos-webui.service → /etc/systemd/system/kairos-webui.service.

2024-05-02T19:49:00Z INF Processing stage step 'Enable systemd services'. ( commands: 4, files: 0, ... )
2024-05-02T19:49:00Z INF Command output:
2024-05-02T19:49:00Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -q "install-mode" /proc/cmdline || grep -q "nodepair.enable" /proc/cmdline ) && \
([ -f /run/cos/live_mode ] || [ -f /run/cos/uki_install_mode ]) && \
[ -f "/sbin/openrc" ]
: exit status 1)' stage name:
2024-05-02T19:49:00Z INF Command output:
2024-05-02T19:49:00Z INF Command output:
2024-05-02T19:49:00Z INF Command output:
2024-05-02T19:49:00Z INF Processing stage step 'Setup groups'. ( commands: 0, files: 0, ... )
2024-05-02T19:49:00Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -q "interactive-install" /proc/cmdline || grep -q "install-mode-interactive" /proc/cmdline) && \
([ -f /run/cos/live_mode ] || [ -f /run/cos/uki_install_mode ]) && \
( [ -e "/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] || [ -e "/usr/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] )
: exit status 1)' stage name:
2024-05-02T19:49:00Z INF Processing stage step 'Setup users'. ( commands: 0, files: 0, ... )
2024-05-02T19:49:00Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -q "interactive-install" /proc/cmdline || grep -q "install-mode-interactive" /proc/cmdline) && \
([ -f /run/cos/live_mode ] || [ -f /run/cos/uki_install_mode ]) && \
[ -f "/sbin/openrc" ]
: exit status 1)' stage name:
2024-05-02T19:49:00Z INF Processing stage step 'Set user password if running in live or uki'. ( commands: 0, files: 0, ... )
2024-05-02T19:49:00Z INF Processing stage step 'Setup sudo'. ( commands: 1, files: 1, ... )
2024-05-02T19:49:00Z INF Command output: Locking password for user root.
passwd: Success

2024-05-02T19:49:00Z INF Processing stage step 'Ensure runtime permission'. ( commands: 2, files: 0, ... )
2024-05-02T19:49:00Z INF Command output:
2024-05-02T19:49:00Z INF Command output:
2024-05-02T19:49:00Z INF Processing stage step ''. ( commands: 0, files: 0, ... )
2024-05-02T19:49:00Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e "/usr/local/cloud-config" ]: exit status 1)' stage name: Ensure runtime permission
2024-05-02T19:49:00Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/sys/firmware/devicetree/base/model" ] && grep -i jetson "/sys/firmware/devicetree/base/model"
: exit status 1)' stage name: Create files
2024-05-02T19:49:00Z INF Processing stage step ''. ( commands: 0, files: 0, ... )
2024-05-02T19:49:00Z INF Processing stage step 'Set hostname'. ( commands: 0, files: 0, ... )
2024-05-02T19:49:00Z INF Processing stage step 'Run commands'. ( commands: 1, files: 0, ... )
2024-05-02T19:49:00Z INF Command output: 2024-05-02 19:49:00 Add DHCP ClientIdentifier=mac to network config if not already present.
2024-05-02 19:49:00   Adding line [DHCP] to file /etc/systemd/network/20-dhcp.network
2024-05-02 19:49:00   Adding line ClientIdentifier=mac to file /etc/systemd/network/20-dhcp.network
2024-05-02 19:49:00   Adding line [DHCP] to file /etc/systemd/network/20-dhcp-legacy.network
2024-05-02 19:49:00   Adding line ClientIdentifier=mac to file /etc/systemd/network/20-dhcp-legacy.network
2024-05-02 19:49:00 Add ll to the root and Kairos .bashrc if not already present.
2024-05-02 19:49:00   Adding line alias ll="ls -alh" to file /root/.bashrc
2024-05-02 19:49:00   Creating new file /home/kairos/.bashrc with line alias ll="ls -alh"
2024-05-02 19:49:00   Creating new file /home/kairos/.profile with line alias ll="ls -alh"
2024-05-02 19:49:00 Add rke2 bin to the path.
2024-05-02 19:49:00   Adding line export PATH="${PATH}:/var/lib/rancher/rke2/bin/" to file /root/.bashrc
2024-05-02 19:49:00   Adding line export PATH="${PATH}:/var/lib/rancher/rke2/bin/" to file /home/kairos/.bashrc
2024-05-02 19:49:00   Adding line export PATH="${PATH}:/var/lib/rancher/rke2/bin/" to file /home/kairos/.profile
/bin/sh: line 1: [[: 0[0]: syntax error: invalid arithmetic operator (error token is "[0]")

2024-05-02T19:49:00Z INF Done executing stage 'initramfs'

2024-05-02T19:49:00Z INF Running stage: initramfs.after

2024-05-02T19:49:00Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e /sbin/openrc ]: exit status 1)' stage name: Enable serial login for alpine
2024-05-02T19:49:00Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [[ $(kairos-agent state get kairos.flavor) =~ ^ubuntu ]]: exit status 1)' stage name: setupcon initramfs.after ubuntu
2024-05-02T19:49:00Z INF Done executing stage 'initramfs.after'

2024-05-02T19:49:00Z INF Running stage: initramfs.before

2024-05-02T19:49:00Z INF Done executing stage 'initramfs.before'

2024-05-02T19:49:00Z INF Running stage: initramfs

2024-05-02T19:49:00Z INF Done executing stage 'initramfs'

2024-05-02T19:49:00Z INF Running stage: initramfs.after

2024-05-02T19:49:00Z INF Done executing stage 'initramfs.after'
jimmykarily commented 2 days ago

With install.auto: true the installation starts automatically (netbooted a VM with virt-manager):

~/workspace/kairos/kairos (master)*$ cat config.yaml 
#cloud-config

users:
  - name: kairos
    passwd: kairos

install:
  auto: true

debug: true
~/workspace/kairos/kairos (master)*$ docker run --rm -ti -v /tmp/build -v /var/run/docker.sock:/var/run/docker.sock -v "$PWD"/config.yaml:/config.yaml --net host quay.io/kairos/auroraboot --set "container_image=docker://quay.io/kairos/debian:bookworm-slim-core-amd64-generic-v3.0.4-73-g8ddb9092-dirty" --cloud-config /config.yaml

With install.auto: false the installation doesn't start.

It seems to work correctly. I'm not sure why I said the installation in your case indeed started. I suspect I was confused, the installation was triggered with kairos-agent manual-install . I don't see any other logs that indicate it started though it shouldn't.

Regarding config options and such, @sarg3nt if you think configuration doesn't get merged properly, you should set debug: true in the config and run the installation with kairos-agent manual-install saving the logs (like you did in the original issue). The applicable config is then printed in the logs so you can tell which options made it in the final one and which not.

sarg3nt commented 2 days ago

@jimmykarily I think we might need some clarification here: What is happening during a manual install is that the config being sent by AuroraBoot is NOT auto running (as it should) but the config being sent by Sphere via custom data IS running when I think it should not be, due to it being a manual install.

jimmykarily commented 2 days ago

Indeed there is some confusion (either on my side or yours :) ) . Let me try to clarify.

The configs are not install recipes that are run (or not) as a whole. The yaml keys in each and every config, are merged with those from all other configs before the kairos-agent starts the installation. There is a component, the config collector, which collects configs from various locations:

All these configs are getting merged into one config which is used to install Kairos. In your case, the config from Auroraboot and the config from Sphere will both be merged, potentially overwritting each other's keys if they both specify the same keys.

To demonstrate the above, I started Auroraboot with this config:

#cloud-config

users:
  - name: kairos
    passwd: kairos

install:
  auto: false

stages:
  dimitris-stage:
    - name: "Dimitris stage"
      commands:
        - echo "dimitris"

debug: true

and I netbooted a VM. From withing the VM (the installation didn't automatically start, because install.auto is false), I created this config file:

root@localhost:/home/kairos# cat c.yaml 
#cloud-config

stages:
  local-config-stage:
    - name: "Local config stage"
      commands:
        - echo "from the local config"

When I run the installation with this command:

kairos-agent --debug manual-install c.yaml 2>&1 | tee out.log

the output log, prints the final config in which you can find these lines:

Config: collector.Config{                                                                                                                                                                                                    
    "config_url": "http://192.168.122.1:8090/_/file?name=other-1",                                                                                                                                                             
    "debug": true,                                                                                                                                                                                                             
    "install": collector.Config{                                                                                                                                                                                               
      "auto": false,                                                                                                                                                                                                           
      "poweroff": false,                                                                                                                                                                                                       
      "reboot": false,                                                                                                                                                                                                         
    },                                                                                                                                                                                                                         
    "stages": collector.Config{                                                                                                                                                                                                
      "dimitris-stage": []interface {}{                                                                                                                                                                                        
        collector.Config{                                                                                                                                                                                                      
          "commands": []interface {}{                                                                                                                                                                                          
            "echo \"dimitris\"",                                                                                                                                                                                               
          },                                                                                                                                                                                                                   
          "name": "Dimitris stage",                                                                                                                                                                                            
        },                                                                                                                                                                                                                     
      },                                                                                                                                                                                                                       
      "local-config-stage": []interface {}{                                                                                                                                                                                    
        collector.Config{                                                                                                                                                                                                      
          "commands": []interface {}{                                                                                                                                                                                          
            "echo \"from the local config\"",                                                                                                                                                                                  
          },                                                                                                                                                                                                                   
          "name": "Local config stage",                                                                                                                                                                                        
        },                                                                                                                                                                                                                     
      },                                                                                                                                                                                                                       
    },                                                                                                                                                                                                                         
    "users": []interface {}{                                                                                                                                                                                                   
      collector.Config{
        "name": "kairos",
        "passwd": "kairos",
      },
    },
  },

See how both dimitris-stage (from auroraboot) and local-config-stage (from the local config file) are in the final config? If they were defining the same keys (e.g. debug), the final one that was merged would define the value of that key (don't rely on this, there is no guaranteed order!).

So to summarize, if you have multiple sources from which configs are supplied, expect them all to be merged before the installation starts. Setting install.auto: false in one file and install.auto: true in another file will result in only one value to be in the final config (no guaranteed order). The install.auto key doesn't refer to the file itself but it's an instruction to the kairos-agent on whether to start the installation automatically or not. This is because the agent runs in the background as a service, so it can even start the installation automatically.

I hope this helps. Let me know if I'm not understanding the issue and explaining the wrong things.