lima-vm / lima

Linux virtual machines, with a focus on running containers
https://lima-vm.io/
Apache License 2.0
15.12k stars 591 forks source link

[kernel v6.2 regression, affects Ubuntu 23.04 and Fedora 38] `limactl start --set '.vmType = "vz"'` crashes: "usernet unable to resolve IP for SSH forwarding" #1577

Closed AkihiroSuda closed 1 year ago

AkihiroSuda commented 1 year ago

EDIT This seems to be a regression in kernel 6.2, reported to https://bugzilla.kernel.org/show_bug.cgi?id=217485 . kernel 6.3 seems to be bootable (via GRUB).

template://experimental/vz still works.


(Slightly off-topic: a patch for loading kernel >= 6.2 using VZLinuxBootLoader, without GRUB)

- [X] Submitted a kernel patch https://lore.kernel.org/linux-efi/CAG8fp8Te=oT1JJhTpOZvgWJrgcTq2DXan8UOVZ=KYCYNa8cKog@mail.gmail.com/ - [X] Wait for the kernel patch to be merged: https://github.com/torvalds/linux/commit/36e4fc57fc1619f462e669e939209c45763bc8f5 (commit message was modified and slightly inaccurate)

balajiv113 commented 1 year ago

Checking now, i believe its related to the preset port in default template.

balajiv113 commented 1 year ago

Default template itself not loading. I tried this Ubuntu version with vz template it worked there.

So something related to some config set in default. will try to close this by tomorrow.

AkihiroSuda commented 1 year ago

The issue seems specific to Ubuntu 23.04. 22.10 works 🤔

balajiv113 commented 1 year ago

Its related to Linux kernel 6.2 https://github.com/utmapp/UTM/issues/5138

balajiv113 commented 1 year ago

I could confirm its working in M1 (MacOS 13.3). Looks like the impact is only with intel mac

AkihiroSuda commented 1 year ago

Seems to be a regression in the merge commit https://github.com/torvalds/linux/commit/888bc86e7cca29de20223ee46e9b770ced2c038e (Merge branch 'acpica')

$ git log v6.1-rc8..888bc86e --oneline --graph
*   888bc86e7cca (HEAD) Merge branch 'acpica'
|\  
| * 470188b09e92 ACPICA: Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage()
| * 404ec60438ad ACPICA: Fix error code path in acpi_ds_call_control_method()
| * 2b6bab689172 ACPICA: Update version to 20221020
| * 4f4356e6b4f2 ACPICA: Add utcksum.o to the acpidump Makefile
| * f6fc0bf2be79 Revert "LoongArch: Provisionally add ACPICA data structures"
| * 51aad1a6723b ACPICA: Finish support for the CDAT table
| * 3f062a516a63 ACPICA: IORT: Update for revision E.e
| * f350c68e3cd5 ACPICA: Add CXL 3.0 structures (CXIMS & RDPAS) to the CEDT table
| * 183f0a09d32c ACPICA: Improve warning message for "invalid ACPI name"
| * ee64b827a9af ACPICA: Add support for FFH Opregion special context data
| * e92e4a451c0c ACPICA: Add a couple of new UUIDs to the known UUID list
| * 407144ebd445 ACPICA: iASL: Add CCEL table to both compiler/disassembler
| * 8ff2906513f5 ACPICA: Do not touch VGA memory when EBDA < 1ki_b
| * 4fe54f509304 ACPICA: Check that EBDA pointer is in valid memory
| * 5c62d5aab875 ACPICA: Events: Support fixed PCIe wake event
| * 60f2096b59bc ACPICA: MADT: Add loong_arch-specific APICs support
| * 5620fe641620 ACPICA: Make acpi_ex_load_op() match upstream
* 57336224da83 ACPI: thermal: Adjust critical.flags.valid check

Will try to bisect further

AkihiroSuda commented 1 year ago

Turned out to be a regression in https://github.com/torvalds/linux/commit/5c62d5aab8752e5ee7bfbe75ed6060db1c787f98 ACPICA: Events: Support fixed PCIe wake event (Ported from https://github.com/acpica/acpica/commit/32d875705c8ee8f99fd8b78dbed48633486a7640)

This commit was introduced in v6.2-rc1, and apparently reverted in v6.3 (https://github.com/torvalds/linux/commit/8e41e0a575664d26bb87e012c39435c4c3914ed9). However, v6.3 and the latest v6.4-rc3 still don't boot 🤔

balajiv113 commented 1 year ago

However, v6.3 and the latest v6.4-rc3 still don't boot

This worked for me. I did the following,

Note: I used raw format disk even with QEMU so that it can boot on vz as well

AkihiroSuda commented 1 year ago

@balajiv113 Can you try the vanilla defconfig?

My test steps are:

set -eux /busybox mkdir -p /etc /proc /root /bin /sbin /sys /usr/bin /usr/sbin /busybox mount -t proc proc /proc /busybox mount -t sysfs sys /sys /busybox mdev -s /busybox --install exec sh


- Run `VMLINUZ_PATH=~/tmp/bzImage INITRD_PATH=~/tmp/initrd.img DISKIMG_PATH=/dev/null ./virtualization` with https://github.com/Code-Hex/vz/blob/v3.0.6/example/linux/main.go

My host is MacBookPro 2020 (`Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz`) running macOS 13.4
AkihiroSuda commented 1 year ago

Update kernel to v6.4-rc3 (Steps are download all deb files https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.4-rc3/amd64/ and install using dpkg)

vmlinuz-6.4.0-060400rc3-generic in https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.4-rc3/amd64/linux-image-unsigned-6.4.0-060400rc3-generic_6.4.0-060400rc3.202305212230_amd64.deb doesn't work either for me. Just tried with https://github.com/Code-Hex/vz/blob/v3.0.6/example/linux/main.go though.

AkihiroSuda commented 1 year ago

Reported to the kernel bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217485

AkihiroSuda commented 1 year ago

( Cross-posting: https://bugzilla.kernel.org/show_bug.cgi?id=217485#c6 )

Turned out that this is a mixture of an ACPICA issue and an EFISTUB issue.

Kernel v6.2 can boot by reverting the both of the following two commits:

Kernel v6.3 can boot by just reverting torvalds/linux@e346bebb, as torvalds/linux@5c62d5a has been already reverted in torvalds/linux@8e41e0a575664d26bb87e012c39435c4c3914ed9. The situation is same for v6.4-rc3 too.

Note that in my test I let Virtualization.framework directly load bzImage without GRUB (akin to qemu-system-x86_64 -kernel bzImage). Apparently, reverting torvalds/linux@e346bebb is not necessary for loading bzImage via GRUB. ( So, Lima can just boot unmodified v6.3 and v6.4-rc3: https://github.com/lima-vm/lima/issues/1577#issuecomment-1562649337 )

AkihiroSuda commented 1 year ago

This seems resolved on macOS 13.5 🎉

mlavi commented 1 year ago

@AkihiroSuda Thanks for your work on this! Could you tell us what changed or was fixed in MacOS 13.5?

AkihiroSuda commented 1 year ago

@AkihiroSuda Thanks for your work on this! Could you tell us what changed or was fixed in MacOS 13.5?

I don't know. Can't find any reference in https://developer.apple.com/documentation/macos-release-notes/macos-13_5-release-notes But apparently Apple updated Virtualization.framework in macOS 13.5 to support Linux 6.2 on Intel.