rancher-sandbox / rancher-desktop

Container Management and Kubernetes on the Desktop
https://rancherdesktop.io
Apache License 2.0
5.88k stars 275 forks source link

qemu fails to start when more than 8 CPUs are set #2190

Open johnklehm opened 2 years ago

johnklehm commented 2 years ago

Actual Behavior

If I request 9 cores on the Kubernetes settings menu Rancher Desktop fails to start the cluster. The culprit seems to be found when looking at ha.stderr.log:

{"level":"debug","msg":"qemu[stderr]: qemu-system-aarch64: Number of SMP CPUs requested (9) exceeds max CPUs supported by machine 'mach-virt' (8)","time":"2022-05-06T21:09:57-05:00"}
{"error":"exit status 1","level":"info","msg":"QEMU has exited","time":"2022-05-06T21:09:57-05:00"}

Hoping we can get the cpu slider dealio to restrict the number of cpus to be the lesser of either the available core count or whatever the max supported by qemu is.

Steps to Reproduce

  1. Install rancher desktop 1.3.0 on an M1 Pro
  2. Select 9 CPUs in the Kubernetes Settings Tab of the Rancher Desktop Preferences Menu
  3. When rancher restarts k8s to use the new settings you'll see the failure messages I posted below.

Result

Kubernetes Error
Rancher Desktop 1.3.0 - darwin (x64)
Error Starting Kubernetes
Error: /Applications/Rancher Desktop.app/Contents/Resources/resources/darwin/lima/bin/limactl exited with code 1
Last command run:
/Applications/Rancher Desktop.app/Contents/Resources/resources/darwin/lima/bin/limactl start --tty=false 0

Context:
Starting virtual machine

Some recent logfile lines:
time="2022-05-06T21:27:50-05:00" level=info msg="[hostagent] Waiting for the essential requirement 1 of 5: \"ssh\""
time="2022-05-06T21:27:50-05:00" level=info msg="[hostagent] QEMU has exited"
time="2022-05-06T21:27:50-05:00" level=fatal msg="exiting, status={Running:false Degraded:false Exiting:true Errors:[] SSHLocalPort:0} (hint: see \"/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/ha.stderr.log\")"
2022-05-07T02:27:50.996Z: + limactl start --tty=false 0
2022-05-07T02:27:50.997Z: Error: /Applications/Rancher Desktop.app/Contents/Resources/resources/darwin/lima/bin/limactl exited with code 1
2022-05-07T02:27:51.001Z: Error starting lima: Error: /Applications/Rancher Desktop.app/Contents/Resources/resources/darwin/lima/bin/limactl exited with code 1
    at ChildProcess.<anonymous> (/Applications/Rancher Desktop.app/Contents/Resources/app.asar/dist/app/background.js:17:141690)
    at ChildProcess.emit (node:events:390:28)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:290:12)

The culprit seems to be found when looking at ha.stderr.log:

{"level":"debug","msg":"qemu[stderr]: qemu-system-aarch64: Number of SMP CPUs requested (9) exceeds max CPUs supported by machine 'mach-virt' (8)","time":"2022-05-06T21:09:57-05:00"}
{"error":"exit status 1","level":"info","msg":"QEMU has exited","time":"2022-05-06T21:09:57-05:00"}

Full ha.stderr.log:

➜  ~ less "/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/ha.stderr.log"

{"level":"warning","msg":"This version of QEMU might not be able to boot recent Linux guests on M1 macOS hosts.Reinstall QEMU with the following commits (included in QEMU 7.0.0):\n- https://github.com/qemu/qemu/commit/ad99f64f \"hvf: arm: Use macros for sysreg shift/masking\"\n- https://github.com/qemu/qemu/commit/7f6c295c \"hvf: arm: Handle unknown ID registers as RES0\"\nSee https://github.com/Homebrew/homebrew-core/pull/96743 for the further information.","time":"2022-05-06T21:09:57-05:00"}
{"level":"warning","msg":"field `firmware.legacyBIOS` is not supported for architecture \"aarch64\", ignoring","time":"2022-05-06T21:09:57-05:00"}
{"level":"debug","msg":"firmware candidates = [/Applications/Rancher Desktop.app/Contents/Resources/resources/darwin/lima/share/qemu/edk2-aarch64-code.fd /usr/share/AAVMF/AAVMF_CODE.fd /usr/share/qemu-efi-aarch64/QEMU_EFI.fd]","time":"2022-05-06T21:09:57-05:00"}
{"level":"debug","msg":"OpenSSH version 8.6.1 detected","time":"2022-05-06T21:09:57-05:00"}
{"level":"debug","msg":"AES accelerator seems available, prioritizing aes128-gcm@openssh.com and aes256-gcm@openssh.com","time":"2022-05-06T21:09:57-05:00"}
{"level":"info","msg":"Starting QEMU (hint: to watch the boot progress, see \"/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/serial.log\")","time":"2022-05-06T21:09:57-05:00"}
{"level":"debug","msg":"qCmd.Args: [/Applications/Rancher Desktop.app/Contents/Resources/resources/darwin/lima/bin/qemu-system-aarch64 -m 10240 -cpu host -machine virt,accel=hvf,highmem=off -smp 9,sockets=1,cores=9,threads=1 -drive if=pflash,format=raw,readonly=on,file=/Applications/Rancher Desktop.app/Contents/Resources/resources/darwin/lima/share/qemu/edk2-aarch64-code.fd -boot order=d,splash-time=0,menu=on -drive file=/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/basedisk,media=cdrom,readonly=on -drive file=/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/diffdisk,if=virtio -cdrom /Users/jklehm/Library/Application Support/rancher-desktop/lima/0/cidata.iso -netdev user,id=net0,net=192.168.5.0/24,dhcpstart=192.168.5.15,hostfwd=tcp:127.0.0.1:56224-:22 -device virtio-net-pci,netdev=net0,mac=52:55:55:90:40:85 -netdev vde,id=net1,sock=/private/var/run/rancher-desktop-shared.ctl -device virtio-net-pci,netdev=net1,mac=52:55:55:5d:29:f1 -netdev vde,id=net2,sock=/private/var/run/rancher-desktop-bridged_en0.ctl -device virtio-net-pci,netdev=net2,mac=52:55:55:e7:ac:38 -device virtio-rng-pci -display none -vga none -device ramfb -device qemu-xhci,id=usb-bus -device usb-kbd,bus=usb-bus.0 -device usb-mouse,bus=usb-bus.0 -parallel none -chardev socket,id=char-serial,path=/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/serial.sock,server=on,wait=off,logfile=/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/serial.log -serial chardev:char-serial -chardev socket,id=char-qmp,path=/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/qmp.sock,server=on,wait=off -qmp chardev:char-qmp -name lima-0 -pidfile /Users/jklehm/Library/Application Support/rancher-desktop/lima/0/qemu.pid]","time":"2022-05-06T21:09:57-05:00"}
{"level":"info","msg":"Waiting for the essential requirement 1 of 5: \"ssh\"","time":"2022-05-06T21:09:57-05:00"}
{"level":"debug","msg":"executing script \"ssh\"","time":"2022-05-06T21:09:57-05:00"}
{"level":"debug","msg":"executing ssh for script \"ssh\": /usr/bin/ssh [ssh -F /dev/null -o IdentityFile=\"/Users/jklehm/Library/Application Support/rancher-desktop/lima/_config/user\" -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o NoHostAuthenticationForLocalhost=yes -o GSSAPIAuthentication=no -o PreferredAuthentications=publickey -o Compression=no -o BatchMode=yes -o IdentitiesOnly=yes -o Ciphers=\"^aes128-gcm@openssh.com,aes256-gcm@openssh.com\" -o User=jklehm -o ControlMaster=auto -o ControlPath=\"/Users/jklehm/Library/Application Support/rancher-desktop/lima/0/ssh.sock\" -o ControlPersist=5m -p 56224 127.0.0.1 -- /bin/bash]","time":"2022-05-06T21:09:57-05:00"}
{"level":"debug","msg":"qemu[stderr]: qemu-system-aarch64: Number of SMP CPUs requested (9) exceeds max CPUs supported by machine 'mach-virt' (8)","time":"2022-05-06T21:09:57-05:00"}
{"error":"exit status 1","level":"info","msg":"QEMU has exited","time":"2022-05-06T21:09:57-05:00"}
{"level":"debug","msg":"stdout=\"\", stderr=\"ssh: connect to host 127.0.0.1 port 56224: Connection refused\\r\\n\", err=failed to execute script \"ssh\": stdout=\"\", stderr=\"ssh: connect to host 127.0.0.1 port 56224: Connection refused\\r\\n\": exit status 255","time":"2022-05-06T21:09:57-05:00"}

Hoping we can get the cpu slider dealio to restrict the number of cpus to be the lesser of either the available core count or whatever the max supported by qemu is.

Expected Behavior

For the cluster to restart with the settings I specified. For the GUI to only allow me to set a valid configuration.

Additional Information

M1 Pro (10 cores)

Rancher Desktop Version

1.3.0

Rancher Desktop K8s Version

1.23.6

Which container runtime are you using?

containerd (nerdctl)

What operating system are you using?

macOS

Operating System / Build Version

ProductName: macOS ProductVersion: 12.3.1 BuildVersion: 21E258

What CPU architecture are you using?

arm64 (Apple Silicon)

Linux only: what package format did you use to install Rancher Desktop?

No response

Windows User Only

No response

jandubois commented 2 years ago

It looks like this is a limitation of qemu when using the hvf (Apple's Hypervisor Framework) accelerator to run at native speed.

There is a lot of background information at https://github.com/utmapp/UTM/issues/3180

TL;DR:

Tasks:

mayrbenjamin92 commented 2 years ago

Good to know what the root cause is - although it actually means that running on e.g. a Mac Studio with 20 cores and e.g. assigning 14 Cores for containerized workloads does not work. Is there any other solution for this?

jandubois commented 2 years ago

Is there any other solution for this?

Unfortunately I don't see one right now. I'm not following the qemu mailing list, but it looks like the discussions about these things are somewhat contentious. 😞

I hope that one day we can take a look at using the Apple virtual machine framework as a configurable alternative to qemu, but I have no idea how much work this will be.

mayrbenjamin92 commented 2 years ago

I just started to download UTM and spawn an x86_64 emulated VM on Linux basis with 32 GB memory and 12 Cores - so far so good

gaktive commented 2 years ago

@rak-phillip we should have a separate ticket to have a UI hard limit with a tool tip for this. "We know you have 24 CPUs but based on the VM limitations, it'll be set to 8." or something.

agraf commented 1 year ago

The QEMU issue to track GICv3 support which would enable -smp > 8 is this: https://gitlab.com/qemu-project/qemu/-/issues/743. I would appreciate Tested-by / Reviewed-by tags on the mailing list to push it forward :).