vivekmiyani / OSX_GVT-D

Guide to pass iGPU to MacOS KVM guest.
94 stars 8 forks source link

Preventing gvt-d corruption on electron based programs #1

Closed albertstarfield closed 3 years ago

albertstarfield commented 3 years ago

Hello, i wanted to report that that issue can be mitigated by setting the qemu priority to Realtime priority and -20 nice (if i understand it correctly its because its priority is higher than most IRQ) this also prevent random memory corruption segfaults for some reason, also by disabling transparent hugepages it also help mitigated the issue. Screen Shot 2021-06-02 at 09 39 36

vivekmiyani commented 3 years ago

That's awesome🥇 !

Would you like to contribute to Readme or anything, that would be really helpful.

albertstarfield commented 3 years ago

Yes i will, but it might take time because i found more potential method the Realtime priority is the first which i describe at the first post, however it works for a while, but then it crashed again even though there is no visible graphical corruption so the crash/panic was either badtrap, kerneltrap, pmem, memory corruption, page file error (this is the crashed i encountered i dont know if there is more which correlate with the memory corruption crash) so i search up for this issue and found out that the issue was the i915ovmf https://www.reddit.com/r/VFIO/comments/nrgqnm/succesful_catalina_intel_uhd_630_gvtd_passthrough/

—

the other methods or maybe its a complete set of fix?

A. by patching the framebuffer-fbmem and framebuffer-steal by using OpenCore-Configurator and using hackintool iASL DSL Source as my patching guide,

example: ( Make sure to match your igpu capabilities to the closest supported framebuffer to reduce the amount of graphical glitches and corruption )

Screen Shot 2021-06-05 at 14 29 44

 
> Result : Its very stable and if there any corruption its able to contain into single application graphical glitch, however there is still panic where the kernel seems to complain about a write happened into a write protected memory

B. Disable the Registers memory write protect through OpenCore-Configurator AppleXcpmCfgLock kernel quirks and change the CPU configuration (on qemu side to newer model but not host model)

 Screen Shot 2021-06-05 at 14 36 56

Result : Corruption Reduced but still there are the same applications crashed complaining about segfaults and application crashes

C. Disable the TransparentHugePages on Linux and add -realtime memlock=on, -no-hpet, -rtc driftfix=slew

Basically to make sure that the pages werent modified by Linux or swapped off from the ram  Screen Shot 2021-06-05 at 14 40 04 Screen Shot 2021-06-05 at 14 41 02

 but im not sure about -rtc driftfix=slew and -no-hpet contributing to the stability of the guest OS

Result : The crashes reduced however if the crashes or panic happened it still complaining about pagefile stuff and bad tailq

D. Increase the allocated ram

Im not sure about why this is really helps mitigate the issue (maybe its something to do because the i915ovmf rom memory allocation) but increasing the ram allocation will decrease the crashes/panic exponentially

Screen Shot 2021-06-05 at 14 45 06

This is what i see from the experiment if you allocate 8192 MiB of Ram it will crash around every 5 mins if you allocate 12288 of Ram it will crash around every 3 to 6 Hours if you allocate 13824 of Ram (im not sure because my ram ran out and kill the VM lol)

The crash is the same, complaining about the pagefile and bad tailq 

E. Observe the crashes and blacklist the memory address
This one is still ongoing research, but what i did is since i use 12GiB of ram allocated to guest OS and observe the kernel crashes or panics and see which memory address need to be blacklisted from being used by the kernel

Here are my memory address obsrevation

Crash 1

panic(cpu 6 caller 0xffffff8017aa50f4): "Too many alloc retries: 502, table:0xffffff8018a759f0, type:1, nelem:1"@/System/Volumes/Data/SWE/macOS/BuildRoots/e90674e518/Library/Caches/com.apple.xbs/Sources/xnu/xnu-7195.121.3/osfmk/kern/ltable.c:504
Backtrace (CPU 6), Frame : Return Address
0xffffffa08b793a60 : 0xffffff8017a8e0dd mach_kernel : _handle_debugger_trap + 0x3fd
0xffffffa08b793ab0 : 0xffffff8017bd4f33 mach_kernel : _kdp_i386_trap + 0x143
0xffffffa08b793af0 : 0xffffff8017bc552a mach_kernel : _kernel_trap + 0x55a
0xffffffa08b793b40 : 0xffffff8017a32a2f mach_kernel : _return_from_trap + 0xff
0xffffffa08b793b60 : 0xffffff8017a8d8fd mach_kernel : _DebuggerTrapWithState + 0xad
0xffffffa08b793c80 : 0xffffff8017a8dbf3 mach_kernel : _panic_trap_to_debugger + 0x273
0xffffffa08b793cf0 : 0xffffff801829d81a mach_kernel : _panic + 0x54
0xffffffa08b793d60 : 0xffffff8017aa50f4 mach_kernel : _ltable_alloc_elem + 0x414
0xffffffa08b793dd0 : 0xffffff8017ae57ac mach_kernel : _waitq_set_lazy_init_link + 0x4c
0xffffffa08b793e00 : 0xffffff801804fbab mach_kernel : _selprocess + 0x35b
0xffffffa08b793f00 : 0xffffff801804eed6 mach_kernel : _select_nocancel + 0xe6
0xffffffa08b793f40 : 0xffffff801813fc9e mach_kernel : _unix_syscall64 + 0x2ce
0xffffffa08b793fa0 : 0xffffff8017a331f6 mach_kernel : _hndl_unix_scall64 + 0x16

```
Crash 2

Backtrace (CPU 4), Frame : Return Address 0xffffff80021581e0 : 0xffffff800228e0dd mach_kernel : _handle_debugger_trap + 0x3fd 0xffffff8002158230 : 0xffffff80023d4f33 mach_kernel : _kdp_i386_trap + 0x143 0xffffff8002158270 : 0xffffff80023c552a mach_kernel : _kernel_trap + 0x55a 0xffffff80021582c0 : 0xffffff8002232a2f mach_kernel : _return_from_trap + 0xff 0xffffff80021582e0 : 0xffffff800228d8fd mach_kernel : _DebuggerTrapWithState + 0xad 0xffffff8002158400 : 0xffffff800228dbf3 mach_kernel : _panic_trap_to_debugger + 0x273 0xffffff8002158470 : 0xffffff8002a9d81a mach_kernel : _panic + 0x54 0xffffff80021584e0 : 0xffffff80023c58f6 mach_kernel : _sync_iss_to_iks + 0x2c6 0xffffff8002158660 : 0xffffff80023c55dd mach_kernel : _kernel_trap + 0x60d 0xffffff80021586b0 : 0xffffff8002232a2f mach_kernel : _return_from_trap + 0xff 0xffffff80021586d0 : 0xffffff8002346ef4 mach_kernel : _vm_map_store_entry_link_rb + 0x34 0xffffffb07f11b0b0 : 0xffffff8002345913 mach_kernel : _vm_map_store_entry_link + 0x63 0xffffffb07f11b100 : 0xffffff80023350f8 mach_kernel : _vm_map_entry_insert + 0x258 0xffffffb07f11b170 : 0xffffff800232ff10 mach_kernel : _vm_map_enter + 0x1280 0xffffffb07f11b370 : 0xffffff800235bbb8 mach_kernel : _vm_map_enter_upl + 0x3a8 0xffffffb07f11b450 : 0xffffff800256b235 mach_kernel : _decmpfs_pagein_compressed + 0x375 0xffffffb07f11b5b0 : 0xffffff8005407a1b com.apple.filesystems.apfs : _apfs_pagein_with_verification + 0x37a 0xffffffb07f11b6d0 : 0xffffff8005407536 com.apple.filesystems.apfs : _apfs_pagein + 0x741 0xffffffb07f11b7c0 : 0xffffff80028b77fe mach_kernel : _vnode_pagein + 0x6ae 0xffffffb07f11b8a0 : 0xffffff80023108f8 mach_kernel : _vnode_pager_cluster_read + 0x48 0xffffffb07f11b900 : 0xffffff80023206be mach_kernel : _vm_fault_page + 0x92e 0xffffffb07f11ba10 : 0xffffff8002374a1a mach_kernel : _shared_region_pager_data_request + 0x33a 0xffffffb07f11bb70 : 0xffffff80023206be mach_kernel : _vm_fault_page + 0x92e 0xffffffb07f11bc80 : 0xffffff8002325ed5 mach_kernel : _vm_pre_fault + 0x16c5 0xffffffb07f11bf00 : 0xffffff80023c5b40 mach_kernel : _user_trap + 0x1b0 0xffffffb07f11bfa0 : 0xffffff800223291f mach_kernel : _hndl_alltraps + 0xdf


Crash 3

Backtrace (CPU 3), Frame : Return Address 0xffffffa073d5b680 : 0xffffff800b08e0dd mach_kernel : _handle_debugger_trap + 0x3fd 0xffffffa073d5b6d0 : 0xffffff800b1d4f33 mach_kernel : _kdp_i386_trap + 0x143 0xffffffa073d5b710 : 0xffffff800b1c552a mach_kernel : _kernel_trap + 0x55a 0xffffffa073d5b760 : 0xffffff800b032a2f mach_kernel : _return_from_trap + 0xff 0xffffffa073d5b780 : 0xffffff800b08d8fd mach_kernel : _DebuggerTrapWithState + 0xad 0xffffffa073d5b8a0 : 0xffffff800b08dbf3 mach_kernel : _panic_trap_to_debugger + 0x273 0xffffffa073d5b910 : 0xffffff800b89d81a mach_kernel : _panic + 0x54 0xffffffa073d5b980 : 0xffffff800b1c58f6 mach_kernel : _sync_iss_to_iks + 0x2c6 0xffffffa073d5bb00 : 0xffffff800b1c55dd mach_kernel : _kernel_trap + 0x60d 0xffffffa073d5bb50 : 0xffffff800b032a2f mach_kernel : _return_from_trap + 0xff 0xffffffa073d5bb70 : 0xffffff800b640473 mach_kernel : _memorystatus_available_memory + 0x183 0xffffffa073d5bc70 : 0xffffff800b5fad0f mach_kernel : _kqueue_dealloc + 0x61f 0xffffffa073d5bcd0 : 0xffffff800b5fbca6 mach_kernel : _knotes_dealloc + 0x166 0xffffffa073d5bd20 : 0xffffff800b5f510f mach_kernel : _fdfree + 0x5f 0xffffffa073d5bd60 : 0xffffff800b6143af mach_kernel : _proc_exit + 0x23f 0xffffffa073d5be10 : 0xffffff800b0ce52e mach_kernel : _thread_terminate_self + 0x3be 0xffffffa073d5bea0 : 0xffffff800b0d2760 mach_kernel : _thread_apc_ast + 0x90 0xffffffa073d5bed0 : 0xffffff800b0855f3 mach_kernel : _ast_taken_user + 0x153 0xffffffa073d5bf00 : 0xffffff800b0329fb mach_kernel : _return_from_trap + 0xcb


Crash 4


```
Backtrace (CPU 0), Frame : Return Address
0xffffffa091c0baa0 : 0xffffff801ac8e0dd mach_kernel : _handle_debugger_trap + 0x3fd
0xffffffa091c0baf0 : 0xffffff801add4f33 mach_kernel : _kdp_i386_trap + 0x143
0xffffffa091c0bb30 : 0xffffff801adc552a mach_kernel : _kernel_trap + 0x55a
0xffffffa091c0bb80 : 0xffffff801ac32a2f mach_kernel : _return_from_trap + 0xff
0xffffffa091c0bba0 : 0xffffff801ac8d8fd mach_kernel : _DebuggerTrapWithState + 0xad
0xffffffa091c0bcc0 : 0xffffff801ac8dbf3 mach_kernel : _panic_trap_to_debugger + 0x273
0xffffffa091c0bd30 : 0xffffff801b49d81a mach_kernel : _panic + 0x54
0xffffffa091c0bda0 : 0xffffff801af16584 mach_kernel : _vnode_iterate + 0x424
0xffffffa091c0be50 : 0xffffff801de268d9 com.apple.filesystems.apfs : _apfs_vfsop_sync + 0x89
0xffffffa091c0bea0 : 0xffffff801af2a39b mach_kernel : _sync + 0x9b
0xffffffa091c0bec0 : 0xffffff801af1605c mach_kernel : _vfs_iterate + 0x21c
0xffffffa091c0bf30 : 0xffffff801af2a317 mach_kernel : _sync + 0x17
0xffffffa091c0bf40 : 0xffffff801b33fc9e mach_kernel : _unix_syscall64 + 0x2ce
0xffffffa091c0bfa0 : 0xffffff801ac331f6 mach_kernel : _hndl_unix_scall64 + 0x16

Result : Not yet applied but from the look of the memory address it seems we can see the pattern of which one that the i915ovmf falsely allocate

vivekmiyani commented 3 years ago

The things are still bit confusing to me. Atm I am not using GVT-D thing because of this Chrome/Electron issues. But previously when I was doing GVT-D, at that time I found temporary workaround by using acrn-kernel, that helped me to open Chrome/Electron applications.! You may want to try this kernel instead of mainline.

EDIT: They seem to have GVT-D specific patches in their kernel, but don't know which commits adds those patches.

albertstarfield commented 3 years ago

ooh intresting ill check it out

vivekmiyani commented 3 years ago

Hey @WahyuSuryoSamudro, Please check the updated Known issues & Fixes section in README.