Closed solardiz closed 2 years ago
@solardiz Can you generate a kdump file and compile LKRG
with debug build?
Basically, IDT is unreadable:
[ 2645.301096] [p_lkrg] <p_get_cpus> online[4] possible[4] present[4] active[4] nr_cpu_ids[4]
[ 2645.301127] [p_lkrg] <p_create_database> p_db.p_CPU_metadata_array[0xffff8800a7918a00] with requested size[448] = sizeof(p_CPU_metadata_hash_mem)[112] * p_db.p_cpu.p_nr_cpu_ids[4]
[ 2645.301333] [p_lkrg] <p_dump_IDT_MSR> CPU:[0] IDT => base[0xffff83035296f000] size[0x100] hash[0x0]
[ 2645.301363] [p_lkrg] Reading IDT 1 to verify data:[ 2645.301388] BUG: unable to handle kernel
paging request at ffff83035296f01c
[ 2645.301419] IP: [<ffffffffc013ae98>] p_dump_x86_metadata+0x4d8/0x510 [p_lkrg]
This is with computation of IDT hash excluded (otherwise it'd crash right there) and loglevel=6
(so that we'd get that printout of IDT base, etc., and a crash in the following debugging code). Without loglevel=6
, it'd stay up for a little while, but would see changing CPU metadata hash (apparently because the IDT base address is part of what's hashed and it somehow keeps changing?)
Apparently, the returned IDT base is in hypervisor address range, not guest kernel, so is understandably unreadable from the kernel.
https://googleprojectzero.blogspot.com/2017/04/pandavirtualization-exploiting-xen.html
I'm not an expert on PV, but in full virtualization, sidt
is intercepted and can be modified by hypervisor. Hyper-V can 'fake' the result for sidt
for UM (for security reasons to 'hide' info leak) but kernel must always be able to correctly set own IDT page and be able to migrate it. Even in Xen PV (based on the link which you pasted) kernel must be able to poke IDT so if this VA is shared with hypervisor, it is still visible (and should be RO) for the kernel. In fact, for the performance reasons it is very common to map hypervisor page with hypercall routine to the guest. Certainly I think this should be investigated more. Unless, the PV Linux image is modified in such a way that sidt
is rewritten to something else. Can we check the kernel image to see if there are special modification for it?
Intercepting sidt
requires UMIP support (in CPU and Xen), which is possibly how/why LKRG just works on the newer system (Xen is documented to have added UMIP support between versions on these systems).
Apparently, in PV kernel IDT updates go via a hypercall. IDT address reads currently don't (no longer do?), but they don't appear to be followed by reads of the actual IDT (so LKRG differs from what Linux itself does there, unless I overlooked something). You can grep Linux kernel sources for load_idt
, store_idt
, HYPERVISOR_set_trap_table
. BTW, in Linux around 4.10, there is:
./arch/x86/kernel/paravirt.c: .store_idt = native_store_idt,
./arch/x86/include/asm/paravirt.h:static inline void store_idt(struct desc_ptr *dtr)
./arch/x86/include/asm/paravirt.h: PVOP_VCALL1(pv_cpu_ops.store_idt, dtr);
./arch/x86/include/asm/paravirt_types.h: void (*store_idt)(struct desc_ptr *);
These mentions of store_idt
in *paravirt*
are gone in current Linux.
the IDT base address is part of what's hashed and it somehow keeps changing
Confirmed - on the affected system, each logical CPU's view of IDT base alternates between all CPUs' addresses. This actually makes sense because the CPUs aren't bound to physical ones, and sidt
isn't virtualized here.
Proposed fix (tested, works):
+++ b/src/modules/database/arch/x86/p_x86_metadata.c
@@ -126,8 +126,16 @@ void p_dump_x86_metadata(void *_p_arg) {
*/
p_arg[p_curr_cpu].p_size = P_X86_MAX_IDT;
+#if defined(CONFIG_X86_64) && defined(CONFIG_XEN_PVH)
+ if ((unsigned long)p_arg[p_curr_cpu].p_base >= 0xffff800000000000ULL &&
+ (unsigned long)p_arg[p_curr_cpu].p_base <= 0xffff87ffffffffffULL) {
+ p_arg[p_curr_cpu].p_base = 0;
+ p_arg[p_curr_cpu].p_size = 0;
+ }
+#endif
+
p_arg[p_curr_cpu].p_hash = p_lkrg_fast_hash((unsigned char *)p_arg[p_curr_cpu].p_base,
- (unsigned int)sizeof(p_idt_descriptor) * P_X86_MAX_IDT);
+ sizeof(p_idt_descriptor) * p_arg[p_curr_cpu].p_size);
// DEBUG
#ifdef P_LKRG_DEBUG
@@ -135,6 +143,7 @@ void p_dump_x86_metadata(void *_p_arg) {
"<p_dump_IDT_MSR> CPU:[%d] IDT => base[0x%lx] size[0x%x] hash[0x%llx]\n",
p_arg[p_curr_cpu].p_cpu_id,p_arg[p_curr_cpu].p_base,p_arg[p_curr_cpu].p_size,p_arg[p_curr_cpu].p_hash);
+ if (p_arg[p_curr_cpu].p_size)
do {
p_idt_descriptor *p_test;
BTW, I still don't know whether this issue only affects old Xen or also recent Xen on old CPUs (without UMIP). This distinction would only matter for us to more specifically document the issue/fix. Maybe @adrelanos has comments to enable that?
Also, I think Xen still supports 32-bit PV guests, but my fix above is only for 64-bit. We'd need to test for a different address range on 32-bit, or maybe disable IDT checking unconditionally when CONFIG_XEN_PVH
yet the kernel is 32-bit.
Looks like exactly how I expected that all 'sidt' instructions are modified in PV but our LKRG is not aware about it. Based on: https://wiki.xenproject.org/wiki/X86_Paravirtualised_Memory_Management
A Xen guest cannot access the Interrupt Descriptor Table (IDT) directly. Instead Xen maintains the IDT used by the physical hardware and provides guests with a completely virtual IDT. A guest writes entries to its virtual IDT using the [HYPERVISOR_set_trap_table](http://xenbits.xen.org/docs/unstable/hypercall/include,public,arch-x86,xen.h.html#Func_HYPERVISOR_set_trap_table) hypercall. This has the following prototype:
...
The entries of the trap_info struct correspond to the fields of a native IDT entry and each will be validated by Xen before it is used. The hypercall takes an array of traps terminated by an entry where address is zero.
Moreover, the same story is for GDT/LDT. In theory we could invoke hypercall
to get necessary IDT information instead of not verifying it at all - but I'm not sure if it is not too much at this stage.
We might not care as the issue does not affect currently supported Qubes, but this might be our bug/shortcoming affecting other systems as well, so to have it documented for later:
Trying to load LKRG on (no longer supported) Qubes 3.x paravirtualized kernels in a Xen VM, based on Linux 4.9.x or 4.14.x, instantly crashes in
p_lkrg_fast_hash
. The backtraces vary - sometimes go frominit_module
, other times in interrupt context. The issue is present in current LKRG, but it's reproducible just the same even on LKRG 0.0 (although to build versions 0.4 and below for this test, I had to add a missing#include
line - fixed in 0.5+).On Qubes OS 4.1 with a 5.10.x based kernel, also running paravirtualized, there's no issue - LKRG loads and appears to work just fine (except that it fails to locate
lookup_fast
, soLKRG won't enforce pCFI validation on 'lookup_fast'
).For example, here's the log for current LKRG on a 4.9.x kernel: