xcp-ng / xcp

Entry point for issues and wiki. Also contains some scripts and sources.
https://xcp-ng.org
1.32k stars 74 forks source link

Domain 11 (vcpu#1) crashed on cpu#0: (N5105) #565

Open danielbayley80 opened 2 years ago

danielbayley80 commented 2 years ago

Relates to thread : https://xcp-ng.org/forum/topic/6307/windows-10-vm-crashes-n5105?_=1662992688895

XCP-ng Host 8.2.1

When I run Blue Iris it crashes my VM. I am pretty sure it is when Blue Iris is probing for graphics hardware that the issue occurs. I'm not having any issues with other VMs. This is a Windows 10 VM. I've tried a clean install too. Windows will run, but BI kills the VM.

(XEN) [526158.312732] d11v1 Unknown Host LBR MSRs
(XEN) [526158.312738] domain_crash called from arch/x86/hvm/vmx/vmx.c#vmx_msr_write_intercept+0x4c2/0x510
(XEN) [526158.312740] Domain 11 (vcpu#1) crashed on cpu#0:
(XEN) [526158.312745] ----[ Xen-4.13.4-9.24.1  x86_64  debug=n   Tainted: M    ]----
(XEN) [526158.312746] CPU:    0
(XEN) [526158.312748] RIP:    0010:[<fffff80631bf8ad2>]
(XEN) [526158.312749] RFLAGS: 0000000000040002   CONTEXT: hvm guest (d11v1)
(XEN) [526158.312752] rax: 0000000000000001   rbx: 0000000000000000   rcx: 00000000000001d9
(XEN) [526158.312753] rdx: 0000000000000000   rsi: 0000000000000000   rdi: 0000000000000280
(XEN) [526158.312755] rbp: ffffd5854226fc80   rsp: ffffd5854226fbf8   r8:  0000000000000001
(XEN) [526158.312757] r9:  ffffd5854226f358   r10: 0000000000000000   r11: ffff958afc6b1080
(XEN) [526158.312758] r12: 00000000018cb4e0   r13: 00007ffd01d34960   r14: 0000000000000280
(XEN) [526158.312760] r15: 0000000000000000   cr0: 0000000080050033   cr4: 0000000000350ef8
(XEN) [526158.312761] cr3: 000000002eefc000   cr2: 00000000003d125a
(XEN) [526158.312762] fsb: 0000000000000000   gsb: ffffa901f09d2000   gss: 00000000003c6000
(XEN) [526158.312764] ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010

When I initially installed XCP-NG I had issues with black screens when relinquishing VGA.

After some reading around I happened upon :

"Custom XCP-ng 8.2.1 install ISO with Xen 64bit VGA Dom0 kernel fix (NUC11) and new ethernet drivers (NUC10/NUC11):" from here :

https://users.ntplx.net/~andrew/xcp/

This got me up and running, but I suspect the issue I am now facing some how relates to this.

olivierlambert commented 2 years ago

Thanks for the report. We'll try to see if there's someone here or in the Xen Project able to make sense of that. Can you output the result of lscpu?

danielbayley80 commented 2 years ago

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 4 Core(s) per socket: 1 Socket(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 156 Model name: Intel(R) Celeron(R) N5105 @ 2.00GHz Stepping: 0 CPU MHz: 1996.834 BogoMIPS: 3993.66 Hypervisor vendor: Xen Virtualization type: para L1d cache: 32K L1i cache: 32K L2 cache: 1536K L3 cache: 4096K Flags: fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid pni pclmulqdq monitor est ssse3 cx16 sse4_1 sse4_2 movbe popcnt aes xsave rdrand hypervisor lahf_lm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase erms rdseed clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 gfni rdpid arch_capabilities

danielbayley80 commented 1 year ago

I have recently tried 8.3 on a new machine. This time a 12th Gen i7-1260P. The install process worked well and it detect 2.5G ethernet etc.but I'm seeing the same issue on my windows VM. I still think it is graphics related. When I free up my N5105, I'll also put this on 8.3 and see what happens.

danielbayley80 commented 1 year ago

image

danielbayley80 commented 1 year ago

Hmmm so I missed a few tricks.

1) i915 in this kernel does not support 12th gen even when using i915.force_probe 2) even if/when it does intel have discontinued support for GVT-g in favor of SR-IOV 3) no drivers exist (as far as i can find) for SR-IOV. There is an experimental version here:

https://github.com/strongtz/i915-sriov-dkms

so going back to blacklisting and passing through might be the only solution short term.

ExtremelyDisappointing !

danielbayley80 commented 1 year ago

A few days later and many hours .....

I have been trying various combinations of hardware and configuration .... Bios vs UEFI, 64 vs 32 bit, N5105, i7-1260P, etc.

I believe I have narrowed this down to a problem on 64 bit. It seems to work in other configurations.

That does make me wonder where the line for this problem is.

The problem might be quite specific to BlueIris and how it tries to interact with graphics hardware (assuming that is the real issue) and how windows handles the query. But that said, part of me still suspects there are issues in the hypervisor and newer processors as I have seen other issues, one being managing power states, another null kernel pointers, etc. which I never saw on older hardware.