utmapp / UTM

Virtual machines for iOS and macOS
https://getutm.app
Apache License 2.0
26.42k stars 1.32k forks source link

Apple Virtual machine keeps setting itself as read-only #4840

Open Git-North opened 1 year ago

Git-North commented 1 year ago

UTM Version 4.1.2 (Beta) Ubuntu Version: 23.4 Lunar Lobster Apple Virtualisation With Rosetta 2 enabled

none of the disks are set as "read only" inside UTM it sometimes works for seconds sometimes minutes but it always happens errors usually has something like "error: read-only file system" but it varies from command to command

ktprograms commented 1 year ago

Do you cleanly shut down the VM every time?

Git-North commented 1 year ago

I always use the "shutdown now" command in the terminal

ktprograms commented 1 year ago

What kind of commands do you do that cause it?

Git-North commented 1 year ago

It happens after I finish the setup process (using iso to install and reboot) I first got the error when I was using nala a package manager for ubuntu I got it when using pacstall aswell when I tried a different vm and finally I tried experimenting and after setting up a fresh vm waiting a few minutes I got the error while doing "mkdir" running no commands prior

ktprograms commented 1 year ago

Is there anything interesting/weird in the dmesg?

Git-North commented 1 year ago

My VM seems to not boot I will tell you the results after I create one again (I will most likely get the same results since this is my 7th VM at this point)

timnoack commented 1 year ago

I had / have the same problem. The kernel log says that the kernel realizes that the checksum of an inode of /dev/vda2 (=ext4 system partition) is incorrect and thus remounts the file system as read-only. Shortly after that the kernel oopses (debian with kernel 5.10.0.158.):

Jan 12 12:03:57 debian kernel: Internal error: Oops - BUG: 0 [#1] SMP
Jan 12 12:03:57 debian kernel: Modules linked in: uinput rfkill joydev binfmt_misc nls_ascii nls_cp437 vfat fat aes_ce_blk crypto_simd hid_generic cryptd aes_ce_>
Jan 12 12:03:57 debian kernel: CPU: 5 PID: 244 Comm: hwrng Not tainted 5.10.0-20-arm64 #1 Debian 5.10.158-2
Jan 12 12:03:57 debian kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 1916.60.2.0.0 11/04/2022
Jan 12 12:03:57 debian kernel: pstate: 00400005 (nzcv daif +PAN -UAO -TCO BTYPE=--)
Jan 12 12:03:57 debian kernel: pc : do_undefinstr+0x2e0/0x2f0
Jan 12 12:03:57 debian kernel: lr : do_undefinstr+0x180/0x2f0
Jan 12 12:03:57 debian kernel: sp : ffff800013143c40
Jan 12 12:03:57 debian kernel: x29: ffff800013143c40 x28: ffff0000c0a59e80 
Jan 12 12:03:57 debian kernel: x27: ffff0000c105bd48 x26: ffff800011a7dea8 
Jan 12 12:03:57 debian kernel: x25: 0000000000000000 x24: ffff800008db8320 
Jan 12 12:03:57 debian kernel: x23: 0000000060c00005 x22: ffff800008db64b4 
Jan 12 12:03:57 debian kernel: x21: ffff800013143e10 x20: 0000000000000000 
Jan 12 12:03:57 debian kernel: x19: ffff800013143cc0 x18: 0000000000000000 
Jan 12 12:03:57 debian kernel: x17: 0000000000000000 x16: 0000000000000000 
Jan 12 12:03:57 debian kernel: x15: ffff800008db64b4 x14: 0000000000000000 
Jan 12 12:03:57 debian kernel: x13: 0000000000000000 x12: 0000000000000000 
Jan 12 12:03:57 debian kernel: x11: 0000000000000000 x10: fcb411eb11aa9ff1 
Jan 12 12:03:57 debian kernel: x9 : ffff8000103949e0 x8 : ffff0000c0a5ad98 
Jan 12 12:03:57 debian kernel: x7 : ffff80021da4e000 x6 : 0000000000000000 
Jan 12 12:03:57 debian kernel: x5 : ffff800011825f80 x4 : 00000000d503403f 
Jan 12 12:03:57 debian kernel: x3 : 0000000000000000 x2 : ffff800011a740f0 
Jan 12 12:03:57 debian kernel: x1 : 0000000000000000 x0 : 0000000060c00005 
Jan 12 12:03:57 debian kernel: Call trace:
Jan 12 12:03:57 debian kernel:  do_undefinstr+0x2e0/0x2f0
Jan 12 12:03:57 debian kernel:  el1_undef+0x2c/0x4c
Jan 12 12:03:57 debian kernel:  el1_sync_handler+0x8c/0xd0
Jan 12 12:03:57 debian kernel:  el1_sync+0x88/0x140
Jan 12 12:03:57 debian kernel:  hwrng_fillfn+0x130/0x1e0 [rng_core]
Jan 12 12:03:57 debian kernel:  kthread+0x12c/0x130
Jan 12 12:03:57 debian kernel:  ret_from_fork+0x10/0x30
Jan 12 12:03:57 debian kernel: Code: 33103e80 2a0003f4 17ffffa6 f90013f5 (d

An other time the kernel oopsed this:

Jan 12 12:03:57 debian kernel: WARNING: CPU: 5 PID: 0 at kernel/rcu/tree.c:624 rcu_eqs_enter.constprop.0+0x74/0x7c
Jan 12 12:03:57 debian kernel: Modules linked in: uinput rfkill joydev binfmt_misc nls_ascii nls_cp437 vfat fat aes_ce_blk crypto_simd hid_generic cryptd aes_ce_>
Jan 12 12:03:57 debian kernel: CPU: 5 PID: 0 Comm: swapper/5 Tainted: G      D           5.10.0-20-arm64 #1 Debian 5.10.158-2
Jan 12 12:03:57 debian kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 1916.60.2.0.0 11/04/2022
Jan 12 12:03:57 debian kernel: pstate: 20c003c5 (nzCv DAIF +PAN +UAO -TCO BTYPE=--)
Jan 12 12:03:57 debian kernel: pc : rcu_eqs_enter.constprop.0+0x74/0x7c
Jan 12 12:03:57 debian kernel: lr : rcu_idle_enter+0x18/0x24
Jan 12 12:03:57 debian kernel: sp : ffff800011bc3f20
Jan 12 12:03:57 debian kernel: x29: ffff800011bc3f20 x28: 0000000000000000 
Jan 12 12:03:57 debian kernel: x27: 0000000000000000 x26: ffff0000c028bd00 
Jan 12 12:03:57 debian kernel: x25: 0000000000000000 x24: 0000000000000000 
Jan 12 12:03:57 debian kernel: x23: ffff80001181a1bc x22: ffff800011426bb0 
Jan 12 12:03:57 debian kernel: x21: ffff80001181a180 x20: 0000000000000005 
Jan 12 12:03:57 debian kernel: x19: ffff800011412008 x18: 00000000fffffff5 
Jan 12 12:03:57 debian kernel: x17: 0000000000000308 x16: 0000000000000040 
Jan 12 12:03:57 debian kernel: x15: 0000000000000000 x14: 0000000000000000 
Jan 12 12:03:57 debian kernel: x13: 0000000000000001 x12: 0000000000000040 
Jan 12 12:03:57 debian kernel: x11: ffff0000c0402238 x10: 593e9d5df37fe0d6 
Jan 12 12:03:57 debian kernel: x9 : ffff800010bb50c0 x8 : ffff0000c028cc18 
Jan 12 12:03:57 debian kernel: x7 : ffff000229b2eac0 x6 : 000000010d7b1d55 
Jan 12 12:03:57 debian kernel: x5 : 00ffffffffffffff x4 : ffff80021da4e000 
Jan 12 12:03:57 debian kernel: x3 : 4000000000000002 x2 : 4000000000000000 
Jan 12 12:03:57 debian kernel: x1 : ffff800011428f80 x0 : ffff00022ee76f80 
Jan 12 12:03:57 debian kernel: Call trace:
Jan 12 12:03:57 debian kernel:  rcu_eqs_enter.constprop.0+0x74/0x7c
Jan 12 12:03:57 debian kernel:  rcu_idle_enter+0x18/0x24
Jan 12 12:03:57 debian kernel:  default_idle_call+0x40/0x178
Jan 12 12:03:57 debian kernel:  do_idle+0x238/0x2b0
Jan 12 12:03:57 debian kernel:  cpu_startup_entry+0x2c/0x9c
Jan 12 12:03:57 debian kernel:  secondary_start_kernel+0x144/0x180

I then installed Fedora 37 with kernel 6.0.7 which also got me an read-only file-system and this log:

[  325.566567] SELinux:  Context system_u:object_r:cert_t:s0 is not valid (left unmapped).
[  335.408511] BTRFS error (device vda3): parent transid verify failed on logical 254394368 mirror 1 wanted 20 found 0
[  335.413248] BTRFS info (device vda3): read error corrected: ino 0 off 254394368 (dev /dev/vda3 sector 513248)
[  335.418330] BTRFS info (device vda3): read error corrected: ino 0 off 254398464 (dev /dev/vda3 sector 513256)
[  335.418375] BTRFS info (device vda3): read error corrected: ino 0 off 254402560 (dev /dev/vda3 sector 513264)
[  335.418419] BTRFS info (device vda3): read error corrected: ino 0 off 254406656 (dev /dev/vda3 sector 513272)
[  335.924646] BTRFS error (device vda3): parent transid verify failed on logical 60178432 mirror 1 wanted 13 found 0
[  335.930552] BTRFS info (device vda3): read error corrected: ino 0 off 60178432 (dev /dev/vda3 sector 133920)
[  335.935140] BTRFS info (device vda3): read error corrected: ino 0 off 60182528 (dev /dev/vda3 sector 133928)
[  335.935332] BTRFS info (device vda3): read error corrected: ino 0 off 60186624 (dev /dev/vda3 sector 133936)
[  335.935486] BTRFS info (device vda3): read error corrected: ino 0 off 60190720 (dev /dev/vda3 sector 133944)
[  337.205193] BTRFS warning (device vda3): csum failed root 257 ino 34956 off 0 csum 0xf5f4f143 expected csum 0x00000000 mirror 1
[  337.205209] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[  337.210375] BTRFS warning (device vda3): csum failed root 257 ino 34956 off 0 csum 0xf5f4f143 expected csum 0x00000000 mirror 1
[  337.210379] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
[  337.393847] SELinux:  Context system_u:object_r:file_context_t:s0 is not valid (left unmapped).
[  337.845752] SELinux:  Context system_u:object_r:var_lib_nfs_t:s0 is not valid (left unmapped).
[  373.307836] BTRFS warning (device vda3): csum failed root 257 ino 34956 off 0 csum 0xf5f4f143 expected csum 0x00000000 mirror 1
[  373.307846] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
[  373.769193] BTRFS error (device vda3): parent transid verify failed on logical 64929792 mirror 1 wanted 13 found 0
[  373.769820] BTRFS info (device vda3): read error corrected: ino 0 off 64929792 (dev /dev/vda3 sector 143200)
[  373.769863] BTRFS info (device vda3): read error corrected: ino 0 off 64933888 (dev /dev/vda3 sector 143208)
[  373.769897] BTRFS info (device vda3): read error corrected: ino 0 off 64937984 (dev /dev/vda3 sector 143216)
[  373.769931] BTRFS info (device vda3): read error corrected: ino 0 off 64942080 (dev /dev/vda3 sector 143224)
[  373.935038] BTRFS warning (device vda3): csum failed root 257 ino 33540 off 0 csum 0xdd812b50 expected csum 0x00000000 mirror 1
[  373.935045] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
[  373.935168] BTRFS warning (device vda3): csum failed root 257 ino 33540 off 0 csum 0xdd812b50 expected csum 0x00000000 mirror 1
[  373.935170] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
[  373.935265] BTRFS warning (device vda3): csum failed root 257 ino 33540 off 0 csum 0xdd812b50 expected csum 0x00000000 mirror 1
[  373.935266] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
[  378.938443] BTRFS critical (device vda3): leaf free space ret -4225, leaf data size 16283, used 20508 nritems 169
[  378.938506] BTRFS critical (device vda3): leaf free space ret -4225, leaf data size 16283, used 20508 nritems 169
[  378.938509] BTRFS critical (device vda3): leaf free space ret -4225, leaf data size 16283, used 20508 nritems 169
[  378.938510] BTRFS critical (device vda3): leaf free space ret -4225, leaf data size 16283, used 20508 nritems 169
[  379.073738] BTRFS warning (device vda3): csum failed root 257 ino 33540 off 0 csum 0xdd812b50 expected csum 0x00000000 mirror 1
[  379.073748] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
[  379.073917] BTRFS warning (device vda3): csum failed root 257 ino 33540 off 0 csum 0xdd812b50 expected csum 0x00000000 mirror 1
[  379.073920] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
[  379.297718] systemd[1]: systemd 251.7-611.fc37 running in system mode (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP -GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
[  379.297873] systemd[1]: Detected virtualization apple.
[  379.297876] systemd[1]: Detected architecture arm64.
[  379.391880] systemd[1]: bpf-lsm: Failed to link program; assuming BPF LSM is not available
[  379.422898] systemd-sysv-generator[4243]: SysV service '/etc/rc.d/init.d/livesys' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust.
[  379.422920] systemd-sysv-generator[4243]: SysV service '/etc/rc.d/init.d/livesys-late' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust.
[  379.429646] systemd-gpt-auto-generator[4234]: Failed to dissect: Permission denied
[  379.431593] systemd[4220]: /usr/lib/systemd/system-generators/systemd-gpt-auto-generator failed with exit status 1.
[  379.794222] ------------[ cut here ]------------
[  379.794226] BTRFS: Transaction aborted (error -2)
[  379.794268] WARNING: CPU: 7 PID: 3504 at fs/btrfs/inode.c:9568 btrfs_rename+0x810/0x8d0
[  379.794293] Modules linked in: tls snd_seq_dummy snd_hrtimer uinput nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr sunrpc vfat fat virtio_snd snd_seq snd_seq_device joydev snd_pcm snd_timer virtio_balloon snd virtiofs soundcore virtio_console zram crct10dif_ce polyval_ce polyval_generic ghash_ce sha3_ce virtio_net sha512_ce sha512_arm64 virtio_gpu net_failover failover virtio_blk virtio_dma_buf apple_mfi_fastcharge ip6_tables ip_tables fuse
[  379.794422] CPU: 7 PID: 3504 Comm: dnf Not tainted 6.0.7-301.fc37.aarch64 #1
[  379.794424] Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 1916.60.2.0.0 11/04/2022
[  379.794425] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  379.794427] pc : btrfs_rename+0x810/0x8d0
[  379.794429] lr : btrfs_rename+0x810/0x8d0
[  379.794430] sp : ffff80000f6fb910
[  379.794431] x29: ffff80000f6fb910 x28: ffff0000c840a300 x27: ffff0000c840a0d0
[  379.794432] x26: ffff0000861635c0 x25: ffff0000c8c1f760 x24: ffff0000c871ca90
[  379.794434] x23: ffff0000c8c1f760 x22: 0000000000028ff2 x21: ffff0000c8c1f530
[  379.794435] x20: ffff00008613a780 x19: ffff0000c8585600 x18: 00000000fffffffe
[  379.794437] x17: 00a16c4317540000 x16: 0000000009c40000 x15: ffff80000f6fb518
[  379.794438] x14: 0000000000000001 x13: 29322d20726f7272 x12: 652820646574726f
[  379.794439] x11: 00000000ffffdfff x10: ffff80000aaf0590 x9 : ffff8000082e56e0
[  379.794452] x8 : 000000000002ffe8 x7 : c0000000ffffdfff x6 : 00000000000affa8
[  379.794453] x5 : 0000000000001fff x4 : 0000000000000001 x3 : ffff80000a2a6008
[  379.794455] x2 : 0000000000000001 x1 : ffff00013559c400 x0 : 0000000000000025
[  379.794456] Call trace:
[  379.794457]  btrfs_rename+0x810/0x8d0
[  379.794459]  btrfs_rename2+0x30/0x80
[  379.794460]  vfs_rename+0x338/0x8a0
[  379.794468]  do_renameat2+0x42c/0x484
[  379.794469]  __arm64_sys_renameat+0x60/0x80
[  379.794471]  invoke_syscall+0x78/0x100
[  379.794477]  el0_svc_common.constprop.0+0x4c/0xf4
[  379.794478]  do_el0_svc+0x34/0x4c
[  379.794479]  el0_svc+0x34/0x10c
[  379.794513]  el0t_64_sync_handler+0xf4/0x120
[  379.794514]  el0t_64_sync+0x190/0x194
[  379.794520] ---[ end trace 0000000000000000 ]---

I then rebooted the host system and installed Debian. I was also able to upgrade to kernel Linux debian 6.1.0-1-arm64 #1 SMP Debian 6.1.4-1 (2023-01-07) aarch64 GNU/Linux without a problem (this was not possible before the reboot). I do get some freezes here and then which require a hard exit of the VM and UTM. But remember that I am using a kernel from the unstable channel so this might not have anything to do with UTM or the Virtualization.Framework at all. The kernel log is completely clean. I attached some logs from these crashes. Overall a very frustrating experience I.. But as far as I see UTM is really only a lightweight frontend for the Virtualization.Framework, so I guess there isn't much you can do about this? However I really like the UTM project, keep the good work up!

com.apple.Virtualization.VirtualMachine_2023-01-13-113856_MacBook-Pro-von-Tim.log com.apple.Virtualization.VirtualMachine_2023-01-13-114201_MacBook-Pro-von-Tim.wakeups_resource.log UTM_2023-01-13-114630_MacBook-Pro-von-Tim.cpu_resource.log

Git-North commented 1 year ago

Any progress on this?

timnoack commented 1 year ago

It dont think this got anything to do with UTM. I see the same problems with VirtIO FS corruption in everything using the Apple Virtualization Framework (eg Docker). I opened a ticket in the feedback assistant but Apple asks for a reliable way to reproduce the issue, which I did not find a way yet.

I dont know how many people are using UTM with the Apple Virtualization Framework enabled but I know that a ton of people are using Docker on macOS, which sporadically crashes due to the same reason for me. As no one else seems to have the same problem there, my current guess is that it's either a faulty macOS installation or a faulty hardware. I got no third-party kernel extensions loaded and programs running in EL0 should not be able to interfere with the hypervisor.

Did you try reinstalling macOS?

maximvl commented 1 year ago

I have the same issue with Fedora 38 arm guest, it works for some time then fs becomes read-only and I have to restart it, will try to get more logs later

One type of message I see is like this

> git commit -m "message"
An unexpected error has occurred: OSError: [Errno 5] Input/output error

# and in dmesg:
[ 2006.845686] BTRFS critical (device vda3): corrupted leaf, root=7 block=0 owner mismatch, have 0 expect 7
maximvl commented 1 year ago

Here is one more case when FS becomes read-only:

[  829.279825] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  829.725624] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  829.725806] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  829.725838] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  829.725853] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  829.725864] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  829.725875] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  829.725899] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  829.725915] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  829.725931] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  829.725943] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  829.725953] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  829.725968] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  829.725983] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  829.725998] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  856.928118] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  856.928181] BTRFS critical (device vda3): corrupted leaf, root=257 block=0 owner mismatch, have 0 expect [256, 18446744073709551360]
[  863.839380] BTRFS warning (device vda3): csum failed root 257 ino 162865 off 0 csum 0xc06d8a81 expected csum 0x00000000 mirror 1
[  863.839386] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 399, gen 0
[  863.839393] BTRFS warning (device vda3): csum failed root 257 ino 162865 off 4096 csum 0x8e9c436a expected csum 0x00000000 mirror 1
[  863.839394] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 400, gen 0
[  863.839544] BTRFS warning (device vda3): csum failed root 257 ino 162865 off 0 csum 0xc06d8a81 expected csum 0x00000000 mirror 1
[  863.839546] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 401, gen 0
[  863.839942] BTRFS warning (device vda3): csum failed root 257 ino 162880 off 0 csum 0x3dfb4247 expected csum 0x00000000 mirror 1
[  863.839944] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 402, gen 0
[  863.839950] BTRFS warning (device vda3): csum failed root 257 ino 162880 off 4096 csum 0x800bb04e expected csum 0x00000000 mirror 1
[  863.839951] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 403, gen 0
[  863.840049] BTRFS warning (device vda3): csum failed root 257 ino 162880 off 0 csum 0x3dfb4247 expected csum 0x00000000 mirror 1
[  863.840053] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 404, gen 0
[  863.841823] BTRFS warning (device vda3): csum failed root 257 ino 162879 off 0 csum 0x9379162b expected csum 0x00000000 mirror 1
[  863.841825] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 405, gen 0
[  863.841831] BTRFS warning (device vda3): csum failed root 257 ino 162879 off 4096 csum 0x6a9e7a52 expected csum 0x00000000 mirror 1
[  863.841832] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 406, gen 0
[  863.841913] BTRFS warning (device vda3): csum failed root 257 ino 162879 off 0 csum 0x9379162b expected csum 0x00000000 mirror 1
[  863.841916] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 407, gen 0
[  863.843813] BTRFS warning (device vda3): csum failed root 257 ino 162893 off 0 csum 0x07fad04f expected csum 0x00000000 mirror 1
[  863.843816] BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 408, gen 0
[  865.182127] BTRFS critical (device vda3): corrupted leaf, root=7 block=0 owner mismatch, have 0 expect 7
[  865.182135] ------------[ cut here ]------------
[  865.182135] BTRFS: Transaction aborted (error -117)
[  865.182191] WARNING: CPU: 3 PID: 470 at fs/btrfs/inode.c:3343 btrfs_finish_ordered_io+0x9b8/0x9c0
[  865.182257] Modules linked in: snd_seq_dummy snd_hrtimer uinput xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat br_netfilter bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill overlay ip_set nf_tables nfnetlink qrtr sunrpc binfmt_misc vfat fat virtio_snd snd_seq snd_seq_device snd_pcm snd_timer virtio_console snd soundcore virtio_balloon virtiofs joydev loop crct10dif_ce polyval_ce polyval_generic ghash_ce sha3_ce virtio_net sha512_ce net_failover sha512_arm64 failover virtio_gpu virtio_blk virtio_dma_buf apple_mfi_fastcharge ip6_tables ip_tables fuse
[  865.182449] CPU: 3 PID: 470 Comm: kworker/u12:6 Not tainted 6.2.15-300.fc38.aarch64 #1
[  865.182456] Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 1916.80.2.0.0 12/19/2022
[  865.182457] Workqueue: btrfs-endio-write btrfs_work_helper
[  865.182460] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[  865.182467] pc : btrfs_finish_ordered_io+0x9b8/0x9c0
[  865.182469] lr : btrfs_finish_ordered_io+0x9b8/0x9c0
[  865.182470] sp : ffff80000d01bc60
[  865.182471] x29: ffff80000d01bc60 x28: 00000000ffffff8b x27: ffff0000c9802000
[  865.182472] x26: ffff0000c1ba5ea0 x25: ffff000109db6c80 x24: ffff0000c9802800
[  865.182474] x23: 0000000000001000 x22: 0000000000000000 x21: ffff0002f1c0bae8
[  865.182475] x20: 0000000000000fff x19: ffff00042cb74a50 x18: 00000000fffffffe
[  865.182476] x17: 6e776f20303d6b63 x16: 6f6c6220373d746f x15: ffff80000d01b830
[  865.182477] x14: 0000000000000001 x13: 293731312d20726f x12: 7272652820646574
[  865.182478] x11: 00000000ffffdfff x10: ffff80000aa502a0 x9 : ffff800008137b40
[  865.182479] x8 : 000000000002ffe8 x7 : c0000000ffffdfff x6 : 00000000000affa8
[  865.182480] x5 : 0000000000001fff x4 : 0000000000000002 x3 : ffff80000a1f3008
[  865.182482] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c86d0000
[  865.182483] Call trace:
[  865.182483]  btrfs_finish_ordered_io+0x9b8/0x9c0
[  865.182485]  finish_ordered_fn+0x1c/0x30
[  865.182487]  btrfs_work_helper+0xe0/0x270
[  865.182488]  process_one_work+0x1e4/0x480
[  865.182507]  worker_thread+0x74/0x40c
[  865.182508]  kthread+0xe8/0xf4
[  865.182509]  ret_from_fork+0x10/0x20
[  865.182511] ---[ end trace 0000000000000000 ]---
[  865.182512] BTRFS: error (device vda3: state A) in btrfs_finish_ordered_io:3343: errno=-117 Filesystem corrupted
[  865.182514] BTRFS info (device vda3: state EA): forced readonly

And I can't even shutdown properly at this point:

~ ⟩ shutdown now                                                                                                            
exec: Failed to execute process '/usr/sbin/shutdown', unknown error number 117
athre0z commented 1 year ago

Same issue here with a Debian Testing aarch64 guest on Apple HV. Consistently triggers multiple times a day during regular use. Sometimes I end up with a corrupted disk and need to fsck from initramfs shell before being able to continue booting where it will then resolve various filesystem inconsistencies.

maximvl commented 1 year ago

Same issue here with a Debian Testing aarch64 guest on Apple HV. Consistently triggers multiple times a day during regular use. Sometimes I end up with a corrupted disk and need to fsck from initramfs shell before being able to continue booting where it will then resolve various filesystem inconsistencies.

@athre0z I got my Fedora fs corrupted to a point it couldn't boot graphical interface and install packages, I was able to recover my data through terminal and shared directory so be careful

vlad-rw commented 1 year ago

@pisker is probably right, I just saw the same btrfs error in a lima vm using vz on Ventura 13.4

athre0z commented 1 year ago

FWIW I'm under the impression that it got a lot better with 13.4: I was easily running into ~4 crashes on any given workday previously whereas now it's more like one crash every two days. That being said, with this kind of spurious bugs it's also perfectly possible that it's just chance or a result of slightly altered workload.

Personally I don't really care about FS corruption: everything of value is on a share anyway and if my VM dies I can spin up a fresh one in 30 minutes. I still prefer the crashy Apple HV with the lightning fast virtfs share over the qemu HV with the horrible 9p network share.

gnattu commented 1 year ago

I also encountered this using Apple Virtualization framework. I'm using the VM as a homelab server (basically container runner) with bridged interface. I chose Apple Virtualization framework over qemu because it has better vNIC performance on my 10G nic. I'm running Rocky Linux 9 and the default format is XFS. When this bug happens, the console will have a log says Corruption of in-memory data detected. Shutting down filesystem.. Unlike other filesystem which changes to read-only mode, XFS is forced to shutdown the filesystem, which equivalent to unmount the root partition. But the good part is, it seems like the corruption in only in memory and not on disk, as I tried to ran xfs_repair after the fs shutdown but no error has been found.

The frequency of this bug is low for me, perhaps 1-2 times a month, but it is still annoying to manually reboot the vm once I found my containers are down. So my workaround is to make a simple rust program that check if the root is available each 10s, if not then force reboot the machine and here's the code:

use std::fs;
use std::io::Result;
use std::io::Write;
use std::thread;
use std::time::Duration;
/// Reference: https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html
pub fn force_reboot() -> Result<()> {
    let mut file = fs::File::create("/proc/sysrq-trigger")?;
    file.write_all(b"b")?;
    Ok(())
}

fn main() {
    loop {
        match fs::read_dir("/") {
            Ok(_) => {}
            Err(_) => {
                force_reboot();
            }
        };
        thread::sleep(Duration::from_secs(10));
    }
}

Hope this could help someone who also runs a server and have same problem.

planetf1 commented 1 year ago

Similar here,

This time with a fedora beta 39 guest. Using applie virtualization I get these memory corruption failures (and then other i/o failure). Fedora (albeit 38) is running fine under QEMU with UTM.

I'd also previously hit an exception constantly with a RHEL 9.x gues - again, only with apple virtualization

planetf1 commented 1 year ago

Here's example:

Screenshot 2023-09-24 at 07 29 14
kurgannet commented 11 months ago

I'm experiencing the same issue. This time trying to install Kali 2023.3 on Apple Virtualization Framework:

Screenshot 2023-10-04 at 10 56 10

However, I have Kali installed inside Docker, running with Apple Virtualization Framework, and no problem at all. Albeit no GUI...

kurgannet commented 11 months ago

Another one while installing Debian 12.1. In fact, I cannot even install a Linux distro, I always get the kernel panic

Screenshot 2023-10-04 at 18 20 49
phaer commented 11 months ago

I think by now there might be enough reports to assume that there's really a bug somewhere, and that all kinds of Distros are affected (NixOS here ;); so adding yet another one seems to have diminishing returns in terms of information gained?

From the reports here it seems the oldest kernel explicitly mentioned was 5.10.0.158, I am myself encountered it multiple times on 6.1.*, 6.1.54 atm (using both XFS and ext4)

kurgannet commented 11 months ago

I have tried the same with a Mac mini M2 Sonoma and I have no issues so far. I have tested Debian 12.1 and Kali 2023.3, no kernel panics in 1 day.

On a MacBook 16 M1 Pro 16GB I cannot even install those guest OS, getting these errors constantly, in a matter of minutes. I mean, I ALWAYS get these errors, I am unable to install (which should take less than half an hour). The last one is:

Screenshot 2023-10-05 at 12 31 08

I am not sure if this is related to disk access. The first error reads "Unable to handle kernel NULL enabling pointer dereference at virtual address 0000000000000020".

Any clues?

rainwoodman commented 11 months ago

I can report that on a OSX 13.6, M2 Mini Pro 16G, the filesystem error occurs frequently.

wrmack commented 11 months ago

My experience has been the same until upgrading the Linux kernel after reading this: https://www.techradar.com/news/linux-kernel-62-is-here-and-it-now-has-mainline-support-for-apple-m1-chips. On M1, Sonoma, using Ubuntu 23.10 (Mantic), Linux kernel 6.5, it is reasonably stable. Unfortunately once there is one disk corruption it is difficult to know whether subsequent issues are totally fresh or a consequence of the first one. I regularly boot into an attached iso and run fsck while the main VM is unmounted. Journalctl reports NULL pointer and EXT4 issues from time to time but it is usable. You can get latest Ubuntu from https://cdimage.ubuntu.com

lfdla commented 11 months ago

Is it me or deactivating ballooning solves the problem? I've deactivated it two weeks ago, and no problem since on my side.

wdormann commented 11 months ago

I came across this bug searching for a BTRFS error that I was encountering in one of my VMs. Not running UTM at the moment, but rather Parallels. But the underlying Apple Virtualization Framework is being used, like UTM can do.

I have a hunch that this isn't anything BTRFS-specific, though. But rather that BTRFS is most likely to notice and warn you at the point that any corruption might happen, compared to some other filesystems. Unfortunately, I haven't been able to find a way to trigger the bug consistently. And unlike what has been suggested earlier, the corruption wasn't simply just in memory, as the filesystem corruption was still there after a reboot.

If there is indeed a silent data corruption bug in Apple Virtualization Framework, then this sounds quite bad as it'll affect everything that uses it. On the other hand, if there's a Linux bug with the virtual hardware provided by the framework, well, they've got some work to do. I've tested a 6.1.57 VM, and the corruption still has happened.

Screenshot 2023-10-18 at 4 59 59 PM
kurgannet commented 11 months ago

In my experience this issue is mostly related to “something” that leads to a kernel oops and filesystem error. ext4 is also affected.

I have tried “every” virtualization platform that relies on AVF and the bug is there, always. Docker seems quite stable because of the Linux kernel it uses (a 5. version, not 6.). It seems to me that they’ve tried and tested kernels until they’ve found the stable one.

In the end, it looks like it’s a combination of AVF + Linux kernel version, so the solution may be on Apple’s on Linux’s side… or both!

wdormann commented 11 months ago

For what it's worth, I've recreated the Linux filesystem corruption in 3 different platform configurations: 1) As seen in the screenshot above, a Gentoo linux VM where the disk is presented as SDA, so something SCSI/SATA I believe. 2) Ubuntu 23.10 stock with QEMU virtualization and NVMe storage:

Screenshot 2023-10-24 at 12 18 23 AM

3) Ubuntu 23.10 with Apple virtualization and latest mainline kernel from the mainline project, and virtio storage.

Screenshot 2023-10-24 at 12 17 20 AM

All three of these scenarios were generated under load (repeatedly compiling a large project (qt6))

Reproduction under a synthetic benchmark of CPU/RAM/Disk seems not as readily possible.

wrmack commented 11 months ago

Just in case you haven't come across it, AsahiLinux are working on the linux kernel specifically for Apple silicon purposes. The specific features they are working on are listed here. Their work finds its way into the kernel. They also provide a downloadable dual-boot solution.

wdormann commented 11 months ago

For what it's worth, I tried a similar exercise of compiling Qt6 on Linux, Windows, and macOS VMs. Windows and macOS VMs went a full 24 hours without any corruption. Any Linux VM I've tried could go maybe as far as 10 minutes before corruption occurred.

I only have the ability to test within VMs so I cannot eliminate the hypervisor layer at this point. But as far as I can tell by now is that aarch64 Linux has problems.

windows_qt6_build macos_qt6_build
wdormann commented 11 months ago

For what it's worth, at least in my case I've narrowed down the problem to the will-corrupt-soon VMs to all have lived on an external SSD drive, which happens to be formatted as ExFAT. I have yet to reproduce filesystem corruption on my Mac's internal SSD drive.

Interestingly, the Mac at no point reports problems at the hardware layer, and at no point does it ever report filesystem corruption on the device itself. I'm not convinced that ExFAT would ever really know if it was corrupt in any subtle way.

So for anyone experiencing corruption in Linux VMs, take particular note of where the VMs live. If they are not on your Mac's internal storage, then you might be in a similar boat as I am. Yes, silent corruption on an external storage device is sort of terrifying. But I suppose that it's more believable than a mystery problem with the virtualization framework or Linux itself.

wrmack commented 11 months ago

Thanks for that point. Mine is on an external SSD (Samsung T7) also. The VM, using Apple virtualization, which provides Ubuntu on an EXT-4 file system sits on the SSD filesystem which I reformatted at the start to APFS. I also use a Windows 11 VM using QEMU virtualization on the same external drive and have had no problems. I think that if Ubuntu freezes, for example due to a kernel null reference, during a write to the VM filesystem then that could cause corruption of the VM filesystem directory tables etc. Hopefully corruption is contained to the VM filesystem and the hardware layer is fine. If I use the Mac disk utility to repair the SSD, it reports no errors. I don't have sufficient space on my macbook to move the VMs to it.

kurgannet commented 11 months ago

In my case it happens regardless of the drive storing the VM:

in all of them the error is present. For example, I am even unable to install Kali Linux. The Linux kernel alway panics (with the subsequent corruption) while copying files, or configuring or whatever… I can’t install. May have tried around 30 times…

kurgannet commented 11 months ago

I forgot: Sonoma, ventura or Monterey… always the same :(

wdormann commented 11 months ago

@kurgannet In the case of your M1 w/ internal drive, are you running the VM off of the primary OS's APFS volume? Or has an additional APFS volume been created?

kurgannet commented 10 months ago

I have tried both (same APFS volume, different APFS volume) with the same results (unable to install). Even after a clean macOS full reinstall (with USB drive formatting the internal drive in order to get a "brand new macOS") the issue is present.

Moreover: I have tried this example Xcode project (https://developer.apple.com/documentation/virtualization/running_gui_linux_in_a_virtual_machine_on_a_mac) and has the same issues. It's pretty clear to me that the issues is not UTM-related but Apple + Linux related, but I haven't found any other discussion forum. Moreover, the UTM community may be more successful should they raise this issue to the Linux kernel team or Apple.

The only safe way I've found to use Apple Virtualisation Framework with Linux is through docker (which creates a linux VM), but it uses a tried and tested 5. linux kernel, not 6..

So, TLDR: AVF + recent (6.*) Linux kernel = kernel panic -> filesystem corruption.

wdormann commented 10 months ago

Yeah, I've tried to narrow down the problem to something reproducible. Internal SSD vs. external. OS APFS volume. vs added APFS volume. Guest OS LVM vs. normal partition. 5.x kernel vs. 6.x kernel. My laptop vs. my coworkers' laptops. etc... And sadly I've yet to find a formula that can reliably reproduce the problem. I feel like there's some sort of race condition where potentially the latency or other aspects of the backing storage might be involved. But based on the participants in this ticket and in https://github.com/lima-vm/lima/issues/1957 , it sure seems like there is indeed a problem. Based on your (lack of) luck, you may be in for a bad time.

Unfortunately, I do not have a performant ARM system that is not my Mac, so I can't really test any bare metal tests that completely eliminate the Apple virtualization framework and/or Apple hardware from the variables.

rainwoodman commented 10 months ago

A bit ago in comment https://github.com/utmapp/UTM/issues/4840#issuecomment-1764436352 disabling ballooning was mentioned as one of the cures. Just want to bump it up a bit as apparently no recent experiments were exploring that possibility.

lfdla commented 10 months ago

@rainwoodman Quick update on my side: no more kernel panic / disk corruption since I've disable ballooning a month ago. Might not be the root cause but at least it had a positive effect. Can someone try it out? @wdormann @kurgannet

kurgannet commented 10 months ago

I have tried disabling ballooning since it was suggested. Moreover, I have tried disabling every other feature, but the result is always the same.

wdormann commented 10 months ago

@lfdla No, disabling memory ballooning didn't prevent corruption for me. That is, if I take a VM that's known to corrupt itself, revert to a clean snapshot, and then adjust the VM settings to have a maximum balloon of 0%, the VM will still go corrupt.

Screenshot 2023-10-31 at 2 13 05 PM
AkihiroSuda commented 10 months ago

Moreover: I have tried this example Xcode project (https://developer.apple.com/documentation/virtualization/running_gui_linux_in_a_virtual_machine_on_a_mac) and has the same issues. It's pretty clear to me that the issues is not UTM-related but Apple + Linux related, but I haven't found any other discussion forum. Moreover, the UTM community may be more successful should they raise this issue to the Linux kernel team or Apple.

Has anybody already reported this issue to Apple? Probably via https://www.apple.com/feedback/macos.html

So, TLDR: AVF + recent (6.*) Linux kernel = kernel panic -> filesystem corruption.

AlmaLinux 9 (kernel 5.14) users are apparently affected by the same issue:

wdormann commented 10 months ago

Just an update on my latest tests:

I've taken one of my provably will-corrupt-itself VMs (which just happened to be created on my external SSD) and created an identical clone of it onto my main internal SSD. So by definition, there is absolutely nothing different about the VM itself other than the MAC address. The copy that lives on the external SSD corrupted its filesystem in less than an hour. The copy that lives on the internal SSD seems to be able to run indefinitely without corruption.

Screenshot 2023-11-01 at 9 26 41 AM

To ensure that there isn't bit rot on the external SSD, I ran both disk verification utilities (i.e. write a pattern, read it and verify it, repeat) and also ran an entire macOS Sonoma VM on the disk performing the same load (build Qt6, clean, repeat), and the VM ran flawlessly for 24 hours.

My hunch is that the Linux kernel has some sort of bug that presents itself with certain disk usage patterns, which may possibly be exacerbated by some unknown (to me) attributes of the backing storage for the VM's disk. Or it's even possible that the Apple Hypervisor framework storage component has a bug, but only in the way that a Linux VM uses it. Or even some combination of the two. I lack the brainpower/equipment/experience to even know how one would even begin to determine which it is, though.

wrmack commented 10 months ago

Not sure if this adds anything - but before starting a session using the VM I boot into the installation iso and in a tty run e2fsck manually on the unmounted VM. sudo e2fsck -fy /dev/xxx. It will find errors. If I continue doing e2fsck it will still find errors. Usually in the directory structure. Eventually there will be no errors, though if I continue to run e2fsck it might find errors again. I shutdown and boot into the VM. If I am lucky I can then go a whole day using it without problems.

The inability of e2fsck to repair the filesystem in one swoop seems consistent with wdormann's hunch.

I have an M1, using Apple virtualization, VM has 2 cpus and 4 GB memory. The VM is on an external Samsung T7 SSD.

rainwoodman commented 10 months ago

Would the number of VM cpu cores be relevant? Assuming it is a race condition it goes away when the VM has a single core then it points to the client?

On Wed, Nov 1, 2023 at 7:15 AM wdormann @.***> wrote:

Just an update on my latest tests:

I've taken one of my provably will-corrupt-itself VMs (which just happened to be created on my external SSD) and created an identical clone of it onto my main internal SSD. So by definition, there is absolutely nothing different about the VM itself other than the MAC address. The copy that lives on the external SSD corrupted its filesystem in less than an hour. The copy that lives on the internal SSD seems to be able to run indefinitely without corruption. [image: Screenshot 2023-11-01 at 9 26 41 AM] https://user-images.githubusercontent.com/14325582/279696411-906f56a9-15ae-433a-aa7e-6235a1da70d7.png

To ensure that there isn't bit rot on the external SSD, I ran both disk verification utilities (i.e. write a pattern, read it and verify it, repeat) and also ran an entire macOS Sonoma VM on the disk performing the same load (build Qt6, clean, repeat), and the VM ran flawlessly for 24 hours.

My hunch is that the Linux kernel has some sort of bug that presents itself with certain disk usage patterns, which may possibly be exacerbated by some unknown (to me) attributes of the backing storage for the VM's disk. Or it's even possible that the Apple Hypervisor framework storage component has a bug, but only in the way that a Linux VM uses it. Or even some combination of the two. I lack the brainpower/equipment/experience to even know how one would even begin to determine which it is, though.

— Reply to this email directly, view it on GitHub https://github.com/utmapp/UTM/issues/4840#issuecomment-1789032541, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABBWTDCQ7KHFIOT3KTLT53YCJKOVAVCNFSM6AAAAAATITWWKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBZGAZTENJUGE . You are receiving this because you were mentioned.Message ID: @.***>

wdormann commented 10 months ago

No, the number of CPU cores doesn't seem to matter. It was pretty easy to reproduce the bug in a 1-core VM.

Screenshot 2023-11-01 at 5 41 44 PM
kurgannet commented 10 months ago

OK, Docker has updated its Linux kernel version since last I checked. Currently running Linux 6.4.16-linuxkit #1 SMP PREEMPT Tue Oct 10 20:38:06 UTC 2023 aarch64 GNU/Linux on Apple Virtualisation Framework and no corruption, no kernel panic... no issues at all!

image

So... does Docker use a custom Linux kernel? Have you tried a UTM VM with 6.4.16 Kernel? May be interesting to just copy the Kernel to a UTM VM and see what happens...

marcan commented 10 months ago

We (Asahi) have found and fixed multiple critical Linux issues in general ARM64 code, of the kind that would only have a high likelihood of causing trouble on high-end CPUs with very wide instruction reordering, that is, Apple Silicon. If the people experiencing corruption are running old kernels, step 1 should be to upgrade to the latest kernel release and try again. If that was the cause, then that will fix it (which seems to line up with the few reports that recent kernels are not affected).

(Seriously, atomic ops were broken in ARM64 for 2 years until I ran into the problem on a Mac and fixed it. You can get away with a lot of subtly broken code with memory ordering/barrier problems when you run it on wimpy ARM64 CPUs, so nobody notices... until now.)

wdormann commented 10 months ago

@marcan I assume that you mean running something newer than 6.5.0 if we want stability? If so, 6.5.? Or is 6.6 required?

ubuntu_btrfs_qemu
marcan commented 10 months ago

Ha, 6.5.0? That one in particular is completely broken. Needs this patch. If your package doesn't have it backported, there's your problem.

6.4 should be fine, as should 6.5.6 according to the changelog.

(Our Asahi tree is currently on 6.5.0 with that patch cherry-picked. And yes, that is the second time ARM64 atomics got broken!)