lima-vm / lima

Linux virtual machines, with a focus on running containers
https://lima-vm.io/
Apache License 2.0
14.57k stars 571 forks source link

limactl start assumes that /bin/bash is present on host #2110

Open a-h opened 6 months ago

a-h commented 6 months ago

Description

I'm creating a NixOS template for Lima. NixOS doesn't follow the Linux FHS, so it doesn't have bash available at /bin/bash.

This is fine, because you can find find bash at #!/usr/bin/env bash instead. That way, you can get the version of bash that's installed in the current environment, rather than assuming bash exists in a specific location.

The issue is down to this: https://github.com/lima-vm/lima/blob/f3dc6ed97aa8f69821ecbe1c2b29988c97e67eb7/pkg/hostagent/requirements.go#L97-L102

At the top of the script is the shebang, which links directly to /bin/bash.

When starting a Lima VM, these scripts are executed, which I could see once I enabled verbose logging:

INFO[0053] [hostagent] Waiting for the essential requirement 1 of 5: "ssh"
DEBU[0053] [hostagent] executing script "ssh"
DEBU[0053] [hostagent] executing ssh for script "ssh": /usr/bin/ssh [ssh -F /dev/null -o IdentityFile="/Users/adrian/.lima/_config/user" -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o NoHostAuthenticationForLocalhost=yes -o GSSAPIAuthentication=no -o PreferredAuthentications=publickey -o Compression=no -o BatchMode=yes -o IdentitiesOnly=yes -o Ciphers="^aes128-gcm@openssh.com,aes256-gcm@openssh.com" -o User=adrian -o ControlMaster=auto -o ControlPath="/Users/adrian/.lima/nix-node/ssh.sock" -o ControlPersist=yes -p 63145 127.0.0.1 -- /bin/bash]
DEBU[0053] [hostagent] stdout="", stderr="bash: line 1: /bin/bash: No such file or directory\n", err=failed to execute script "ssh": stdout="", stderr="bash: line 1: /bin/bash: No such file or directory\n": exit status 127

From the logs, it's clear that it's trying to ssh and run /bin/bash, which doesn't exist on my system.

Looking into the reason why, I found that the sshocker package parses the shebang and attempts to use it: https://github.com/lima-vm/sshocker/blob/024e386607793c4d16867fe7c7ccc5fd38346330/pkg/ssh/ssh.go#L92C22-L92C44

I think that updating the shebangs to #!/usr/bin/env bash will work more reliably on platforms that don't support FHS.

Any interest in a PR on that?

In the meantime, I'm patching my NixOS system to have a symlink with the following NixOS configuration, which is getting me to stage 2.

system.activationScripts.binbash = {
    deps = [ "binsh" ];
    text = ''
         ln -s /bin/sh /bin/bash
    '';
  };
AkihiroSuda commented 6 months ago

I think that updating the shebangs to #!/usr/bin/env bash will work more reliably on platforms that don't support FHS.

SGTM

afbjorklund commented 6 months ago

We did something similar for FreeBSD already, it (optionally) has /usr/local/bin/bash but only features /bin/sh

https://github.com/lima-vm/lima/pull/1509/commits/5756e4cac9f607111cc10d7f6b1f2c7f0bc2c983

I am not sure if /run is also going to be a problem, or if NixOS follows systemd even if it doesn't follow linux?

https://github.com/lima-vm/lima/pull/1509/commits/9d7f541278d0e22a10f99dbe2a3d8d3b3784a6cd


EDIT: From PR

a-h commented 6 months ago

Thanks @afbjorklund! Those pointers helped a lot.

Commit https://github.com/lima-vm/lima/commit/5756e4cac9f607111cc10d7f6b1f2c7f0bc2c983 doesn't seem to have made it into the main branch, not sure where it ended up, but it looks the same as what I was thinking.

I didn't really understand how Lima configured the host VMs, but now I've worked through the problems, I do.

NixOS uses systemd, so that will work OK, but I now understand now that Lima creates an ISO file containing userdata: https://github.com/lima-vm/lima/blob/f3dc6ed97aa8f69821ecbe1c2b29988c97e67eb7/pkg/hostagent/hostagent.go#L134

And that the ISO containing the userdata https://github.com/lima-vm/lima/blob/master/pkg/cidata/cidata.TEMPLATE.d/user-data gets mounted by running a script. The boot commands set in the user data are then able to use the files that are in the userdata ISO to configure the rest of the VM... and that's why the next command in the hostagent requirements.go file is simply to wait for the /run/lima-ssh-ready file to exist - i.e. because the cloud-init is set, the user data should be working away the background, installing the guest agent etc.

Of course, for NixOS, this won't happen, because NixOS doesn't have cloud-init enabled out of the box, hence why stage 2 just hung for me.

To try to work around, I created a custom config for NixOS, and built an ISO from it.

In Nix, you create a configuration.nix and run nix run github:nix-community/nixos-generators -- -f iso -c configuration.nix to create a custom ISO that can be run as a VM. So, I enabled sshd, cloud-init, created a new user (adrian), and gave Lima SSH access to it. I then setup a link from /bin/bash to /bin/sh so that the scripts ran.

{ config, pkgs, ... }: {
  # Enable the OpenSSH server.
  services.sshd.enable = true;
  # Enable cloud-init, since Lima uses this to configure the instance.
  services.cloud-init.enable = true;
  users.users = {
    adrian = {
      isNormalUser = true;
      openssh.authorizedKeys.keys = [
        # This user comes from /Users/adrian/.lima/_config/user.pub
        # This can be acquired progamatically with `limactl info | jq -r ".limaHome"`
        "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIChdmNxNN+sP9c/i3WYeG8cosR4x3krQYchRIZoEv8Mf adrian@adrian-2.local"
      ];
    };
  };
  system.activationScripts.binbash = {
    deps = [ "binsh" ];
    text = ''
      ln -s /bin/sh /bin/bash
    '';
  };
}

But... it didn't work, because the CIDATA scripts assume a lot about the environment that they're going to be operating in.

Jan 03 10:48:50 lima-nix-visor-node cloud-init[12319]: + LIMA_CIDATA_MNT=/mnt/lima-cidata
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12319]: + LIMA_CIDATA_DEV=/dev/disk/by-label/cidata
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12319]: + mkdir -p -m 700 /mnt/lima-cidata
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12319]: + mount -o ro,mode=0700,dmode=0700,overriderockperm,exec,uid=0 /dev/disk/by-label/cidata /mnt/lima-cidata
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12319]: + export LIMA_CIDATA_MNT
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12319]: + exec /mnt/lima-cidata/boot.sh
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12319]: LIMA| Executing /mnt/lima-cidata/boot/00-modprobe.sh
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Loading kernel module "fuse"
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12329]: modprobe: can't change directory to '/lib/modules': No such file or directory
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Faild to load "fuse" (negligible if it is built-in the kernel)
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Loading kernel module "tun"
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12330]: modprobe: can't change directory to '/lib/modules': No such file or directory
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Faild to load "tun" (negligible if it is built-in the kernel)
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Loading kernel module "tap"
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12331]: modprobe: can't change directory to '/lib/modules': No such file or directory
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Faild to load "tap" (negligible if it is built-in the kernel)
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Loading kernel module "bridge"
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12332]: modprobe: can't change directory to '/lib/modules': No such file or directory
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Faild to load "bridge" (negligible if it is built-in the kernel)
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Loading kernel module "veth"
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12333]: modprobe: can't change directory to '/lib/modules': No such file or directory
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Faild to load "veth" (negligible if it is built-in the kernel)
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Loading kernel module "ip_tables"
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12334]: modprobe: can't change directory to '/lib/modules': No such file or directory
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Faild to load "ip_tables" (negligible if it is built-in the kernel)
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Loading kernel module "ip6_tables"
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12335]: modprobe: can't change directory to '/lib/modules': No such file or directory
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Faild to load "ip6_tables" (negligible if it is built-in the kernel)
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Loading kernel module "iptable_nat"
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12336]: modprobe: can't change directory to '/lib/modules': No such file or directory
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Faild to load "iptable_nat" (negligible if it is built-in the kernel)
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Loading kernel module "ip6table_nat"
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12337]: modprobe: can't change directory to '/lib/modules': No such file or directory
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Faild to load "ip6table_nat" (negligible if it is built-in the kernel)
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Loading kernel module "iptable_filter"
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12338]: modprobe: can't change directory to '/lib/modules': No such file or directory
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Faild to load "iptable_filter" (negligible if it is built-in the kernel)
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Loading kernel module "ip6table_filter"
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12339]: modprobe: can't change directory to '/lib/modules': No such file or directory
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Faild to load "ip6table_filter" (negligible if it is built-in the kernel)
Jan 03 10:48:50 lima-nix-visor-node cloud-init[12328]: Loading kernel module "nf_tables"

And there's various failures logged about directories not existing:

Jan 03 10:48:51 lima-nix-visor-node cloud-init[12319]: LIMA| Executing /mnt/lima-cidata/boot/20-rootless-base.sh
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12371]: + command -v systemctl
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12371]: + for f in .profile .bashrc .zshrc
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12371]: + grep -q '# Lima BEGIN' /home/adrian.linux/.profile
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12372]: grep: /home/adrian.linux/.profile: No such file or directory
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12371]: + cat
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12373]: /mnt/lima-cidata/boot/20-rootless-base.sh: line 10: /home/adrian.linux/.profile: No such file or directory
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12319]: LIMA| WARNING: Failed to execute /mnt/lima-cidata/boot/20-rootless-base.sh

And it tries to install the guest agent and fails for similar reasons.

an 03 10:48:51 lima-nix-visor-node cloud-init[12319]: LIMA| Executing /mnt/lima-cidata/boot/25-guestagent-base.sh
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12374]: + '[' reverse-sshfs = reverse-sshfs ']'
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12375]: ++ seq 0 0
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12374]: + for f in $(seq 0 $((LIMA_CIDATA_MOUNTS - 1)))
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12374]: + mountpointvar=LIMA_CIDATA_MOUNTS_0_MOUNTPOINT
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12376]: ++ eval echo '$LIMA_CIDATA_MOUNTS_0_MOUNTPOINT'
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12376]: +++ echo /tmp/lima
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12374]: + mountpoint=/tmp/lima
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12374]: + mkdir -p /tmp/lima
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12378]: ++ id -g adrian
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12374]: + gid=100
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12374]: + chown 501:100 /tmp/lima
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12374]: + install -m 755 /mnt/lima-cidata/lima-guestagent /usr/local/bin/lima-guestagent
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12380]: install: can't create '/usr/local/bin/lima-guestagent': No such file or directory
Jan 03 10:48:51 lima-nix-visor-node cloud-init[12319]: LIMA| WARNING: Failed to execute /mnt/lima-cidata/boot/25-guestagent-base.sh

Given the complexity of the scripts, I think it would be quite hard to debug them all on NixOS, then test that nothing has broken on all the other operating systems too, mostly because of how long it takes to go through a run/check cycle. I'm not sure if there's automated tests for each of the VM host types etc.

To run NixOS in Lima, it probably makes the most sense to make a configuration.nix that installs all the Lima requirements (including the guest agent), configures any port forwarding rules, sets up the appropriate users etc. and use Lima in "plain" mode, so I'll probably play around with that. However, I was hoping to use mounts and port forwarding.

{ config, pkgs, ... }: {
  # Enable the OpenSSH server.
  services.sshd.enable = true;
  # Enable cloud-init, since Lima uses this to configure the instance.
  services.cloud-init.enable = true;
  # Configure packages required by Lima.
  environment.systemPackages = [
    pkgs.sshfs
  ];
  environment.etc = {
    "fuse.conf" = {
      text = ''
        user_allow_other
        mount_max = 1000
      '';
      mode = "0777";
    };
  };
  users.users = {
    adrian = {
      isNormalUser = true;
      openssh.authorizedKeys.keys = [
        # This user comes from /Users/adrian/.lima/_config/user.pub
        # This can be acquired progamatically with `limactl info | jq -r ".limaHome"`
        "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIChdmNxNN+sP9c/i3WYeG8cosR4x3krQYchRIZoEv8Mf adrian@adrian-2.local"
      ];
    };
  };
  system.activationScripts.binbash = {
    deps = [ "binsh" ];
    text = ''
      ln -s /bin/sh /bin/bash
    '';
  };
}

So, this issue is totally off track, and I guess I don't really care about /bin/bash any more, since the rest of the stack won't follow, so ... maybe I should close it?

afbjorklund commented 6 months ago

We can can change from /bin/bash to /usr/bin/env bash for the agents anyway, it shouldn't hurt anything.

afbjorklund commented 6 months ago

@a-h : if you are making a NixOS template there was some previous discussion:

There is a new guestInstallPrefix that you can use instead of /usr/local. install: can't create '/usr/local/bin/lima-guestagent': No such file or directory

But your modprobe should probably be able to find the kernel modules... (needs to be patched to look in /run/current-system/kernel-modules)

patryk4815 commented 1 week ago

@a-h did you check? https://github.com/lima-vm/lima/discussions/430#discussioncomment-2645108