Closed refutationalist closed 2 months ago
I'm going to create a really minimal xen to test it with and see if it's something we're doing or not.
It's also worth noting that these machine, when booted into a PVH dom0, only see the PCI-E lanes of the first CPU. May be completely unrelated.
The situation as of https://github.com/refutationalist/saur/pull/28
[ 140.772805] reboot: Restarting system
(XEN) Hardware Dom0 shutdown: rebooting machine
(XEN) ----[ Xen-4.18.2-arch x86_64 debug=n Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e008:[<00000000ca5ec780>] 00000000ca5ec780
(XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (d0v0)
(XEN) rax: 00000000000000a0 rbx: 0000000000000004 rcx: 0000000000000050
(XEN) rdx: 0000000000000001 rsi: 0000000000000000 rdi: 0000000000000003
(XEN) rbp: ffff83102fff7ce8 rsp: ffff83102fff7cc0 r8: 0000000000000000
(XEN) r9: 0000000000000000 r10: 0000000000000832 r11: 0000000000000835
(XEN) r12: 0000000000000000 r13: ffff830000000472 r14: 000000000000000a
(XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 0000000000172660
(XEN) cr3: 000000102ffd1000 cr2: 00000000ff610000
(XEN) fsb: 00007bad0a06d0c0 gsb: ffff888135a00000 gss: 0000000000000000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
(XEN) Xen code around <00000000ca5ec780> (00000000ca5ec780):
(XEN) 05 00 00 45 3b c4 75 17 <a1> 00 00 61 ff 00 00 00 00 44 8b c0 41 83 e0 f0
(XEN) Xen stack trace from rsp=ffff83102fff7cc0:
(XEN) 0000000000000000 0000000000000000 0000000000000002 ffff831023900000
(XEN) 0000000000000001 0100005000011c00 0000030000a00001 00a08300038000a0
(XEN) 038000a000000280 ff041c0000a00f02 00a0000002000050 0100005000031c00
(XEN) 0000000000a00000 ffff83102fff7d80 0000000000000065 ffff83102fff7dd8
(XEN) ffff830000000472 ffff83102fff7d80 000000102ffd1000 ffff82d04029b9fc
(XEN) ffff82d040332800 0000000000000000 ffff83102fff7dc0 0000000000000000
(XEN) 000000100a494000 0000000000000000 000000000000000a 0000000000000046
(XEN) ffff82d040352a56 ffff82d040352b55 0000000000000000 0000000000000000
(XEN) ffff831023266000 ffff82d040352181 000000002fff7e20 000083102fff7de0
(XEN) 0000000000000000 0000000000000001 ffff831023266000 0000000000000001
(XEN) ffff8310232661f8 ffffc9004003bb78 0000000000000000 ffff82d04022be53
(XEN) ffff82d040208027 fffffffffffffff2 000000000000001d ffff831023254000
(XEN) ffffc9004003bccc ffff82d04024d66f 0000000140201247 ffff82d04020124d
(XEN) ffff82d040201247 ffff82d04020124d ffff83102fff7ef8 000000000000001d
(XEN) ffff82d040317aba ffff82d04020124d ffff82d040201247 ffff82d04020124d
(XEN) ffff82d040201247 ffff82d04020124d ffff82d040201247 ffff82d04020124d
(XEN) ffff831023254000 0000000000000000 0000000000000000 0000000000000000
(XEN) ffff83102fff7fff 0000000000000000 ffff82d0402012c1 00000000fee1dead
(XEN) ffffffff82e50ee0 0000000000000000 ffffc9004003bce8 0000000028121969
(XEN) 0000000000000004 0000000000000246 ffffc9004003bb78 ffffc9004003bb80
(XEN) Xen call trace:
(XEN) [<00000000ca5ec780>] R 00000000ca5ec780
(XEN) [<0000000000000000>] S 0000000000000000
(XEN) [<ffff82d04029b9fc>] S efi_reset_system+0x4c/0x90
(XEN) [<ffff82d040332800>] S io_apic.c#clear_IO_APIC_pin+0/0x110
(XEN) [<ffff82d040352a56>] S __stop_this_cpu+0x16/0x30
(XEN) [<ffff82d040352b55>] S smp_send_stop+0xc5/0xe0
(XEN) [<ffff82d040352181>] S machine_restart+0x161/0x290
(XEN) [<ffff82d04022be53>] S hwdom_shutdown+0x53/0xc0
(XEN) [<ffff82d040208027>] S domain.c#domain_shutdown.part.0+0x47/0x110
(XEN) [<ffff82d04024d66f>] S do_sched_op+0x38f/0x520
(XEN) [<ffff82d04020124d>] S lstar_enter+0xcd/0x150
(XEN) [<ffff82d040201247>] S lstar_enter+0xc7/0x150
(XEN) [<ffff82d04020124d>] S lstar_enter+0xcd/0x150
(XEN) [<ffff82d040317aba>] S pv_hypercall+0x4ea/0x580
(XEN) [<ffff82d04020124d>] S lstar_enter+0xcd/0x150
(XEN) [<ffff82d040201247>] S lstar_enter+0xc7/0x150
(XEN) [<ffff82d04020124d>] S lstar_enter+0xcd/0x150
(XEN) [<ffff82d040201247>] S lstar_enter+0xc7/0x150
(XEN) [<ffff82d04020124d>] S lstar_enter+0xcd/0x150
(XEN) [<ffff82d040201247>] S lstar_enter+0xc7/0x150
(XEN) [<ffff82d04020124d>] S lstar_enter+0xcd/0x150
(XEN) [<ffff82d0402012c1>] S lstar_enter+0x141/0x150
(XEN)
(XEN) Pagetable walk from 00000000ff610000:
(XEN) L4[0x000] = 000000102ffd0063 ffffffffffffffff
(XEN) L3[0x003] = 00000000c1a0f063 ffffffffffffffff
(XEN) L2[0x1fb] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 00000000ff610000
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
reboot=acpi fixes it. I should spend some time with the documentation some day.
This is an HP Z840 with two E5-2670v3s. It's been seen both while domUs are running and when not. It may be upstream related.