intel / haxm

Intel® Hardware Accelerated Execution Manager (Intel® HAXM)
BSD 3-Clause "New" or "Revised" License
3.24k stars 878 forks source link

Guest debugging support #66

Closed AlexAltea closed 5 years ago

AlexAltea commented 6 years ago

Opening this issue as follow-up to the discussion in qemu-devel: https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg00030.html

Summarizing: Guest debugging is extremely relevant to debugging bootloaders/microkernels, or in my case, kernels that do not include debugging backends. Aside from the intrinsic value of this feature, it's also really beneficial to developers/researchers as neither WHPX (Windows) nor HVF (macOS) support guest debugging and while they remain closed-source this probably won't change.


This week I've started experimenting with guest debugging support on HAXM, but it's still too early to submit any patches (plus, this should probably coordinated with QEMU somehow). These are the key ideas (inspired by KVM):

Current experiments can be found in https://github.com/AlexAltea/orbital-qemu/commit/77d61c71800106ca15d6eb63d29327e95cd546fd (QEMU) and https://github.com/AlexAltea/haxm/tree/debug (HAXM). I'm facing some issues still:

Disclaimer: I haven't much experience with debuggers, so any feedback will be helpful.

raphaelning commented 6 years ago

Thanks! This is a very nice feature to have, and your summary really helps me understand how breakpoints work.

I'm facing some issues still:

Hardware breakpoints have no effect. Software breakpoints trigger a triple fault, not a #BP.

BP doesn't really cause a VM exit by default. There is a VMCS exception bitmap per CPU that controls which CPU exceptions should be handled by the host/hypervisor and which by the guest (see Intel SDM Vol. 3C: 25.2 Other Causes of VM Exits). So I believe you need to (conditionally) set the bits for EXC_DEBUG and EXC_BREAK_POINT [sic]:

https://github.com/intel/haxm/blob/e23a8dd04cc8c458517b7bd164b429706d7875d5/core/vcpu.c#L1187

and handle them properly in exit_exc_nmi() (core/vcpu.c).

In addition, if you search for "debug" in Intel SDM Vol. 3C, there are a lot more details about how hardware (VT) can help enable guest debugging support. 32.2: Virtualization Support for Debugging Facilities gives a good summary, and I've also made my own reading list:

  1. 24.4.2 Guest Non-Register State: The pending debug exceptions field in VMCS (GUEST_PENDING_DBE), plus 26.6.3 Delivery of Pending Debug Exceptions after VM Entry.
  2. 24.7.1 VM-Exit Controls: The save debug controls flag of the VM-exit controls field in VMCS (EXIT_CONTROL_SAVE_DEBUG_CONTROLS).
  3. 24.8.1 VM-Entry Controls: The load debug controls flag of the VM-entry controls field in VMCS (ENTRY_CONTROL_LOAD_DEBUG_CONTROLS).
  4. 27.2.1 Table 27-1: Exit Qualification for Debug Exceptions.

Ideally, we should support both the QEMU GDB (for debugging the guest itself) and GDB running in the guest (for debugging an app that runs in guest user space). I need to read more to understand how we can achieve this.

AlexAltea commented 6 years ago

Thank you, indeed I forgot to enable #DB and #BP in the exception bitmap. After doing so, software breakpoints work perfectly, although hardware breakpoints are still getting ignored.

This might be caused by wrong drN values: I've noticed the VMCS structure only offers a GUEST_DR7 member, but what about dr0, dr1, dr2, dr3, dr6? Do I need to load/save host/guest drN's manually? I've noticed the vcpu_set_regs handler allows updating such registers in vcpu->state->_dr*: https://github.com/intel/haxm/blob/e9f8c8908735f7b6e5ffa73ce643459ba5e8546b/core/vcpu.c#L3695-L3700

Similarly, exit_dr_access also allows changing the vcpu->state->_dr* registers.

However, I have not found anyhere in the codebase a mechanism to load such values into the guest debug registers... I'm assuming this is unimplemented: Should we include dr_dirty flag in vcpu_t and call set_dr* accordingly right before doing a VM-enter?

raphaelning commented 6 years ago

software breakpoints work perfectly

Great, congrats!

Here's my understanding of how hardware breakpoints roughly work:

  1. GDB asks QEMU to insert a HW BP (I haven't checked this part).
  2. QEMU prepares the data to be written to guest DR{i, 7} registers based on BP information, where i is one of {0, 1, 2, 3}, and DR7 is the Debug Control Register.
  3. QEMU passes this data to hypervisor by invoking an ioctl on each vCPU.
  4. At the next VM entry of each vCPU, hypervisor loads DR{i, 7} of the current host CPU with the data specified by QEMU.
  5. One of the host CPUs hits the HW BP and takes a VM exit (instead of invoking the guest #DB exception handler).
  6. Hypervisor returns to QEMU with information about the HW BP being triggered, including the contents of guest DR{6, 7}, where DR6 is the Debug Status Register.
  7. Using the information provided by hypervisor, especially guest DR{6, 7}, QEMU identifies the GDB BP that corresponds to the HW BP.
  8. QEMU returns control to GDB.

Based on this:

AlexAltea commented 6 years ago

Thanks for your detailed overview on hardware breakpoints, I've added the missing features on https://github.com/AlexAltea/haxm/commit/f2808d8fa83754d2305e2df199e564069b659f39. Hardware breakpoints are now successfully triggered:

Hardware assisted breakpoint 1 at 0xffffffff825805b0

Thread 1 hit Breakpoint 1, 0xffffffff825805b0 in ?? ()
(gdb)

The only thing missing now is single-stepping, which for some reasons triggers a triple-fault. I'm trying to figure out why this happens without much success. As soon as I'm finished, I'll submit a PR for this feature.


So is it possible for the host DR to be "active" while the user tries to debug the guest? E.g., a QEMU developer wants to debug QEMU GDB itself? If so, we do need to save/restore host DR.

I think that issue cannot be solved by HAXM. As soon as such QEMU developer uses hardware breakpoints, the corresponding dr7 bits will be enabled, but before entering the guest VM, the guest dr0-dr3 registers are restored. What if the host instructions right afterwards are mapped to the same addresses pointed by any of the guest hardware breakpoints?

This could only be fixed at hardware-level by adding GUEST_DR0 to GUEST_DR3 to the VMCS fields. Meanwhile the only approach to debugging QEMU/HAXM, while debugging a virtual machine, is using hardware breakpoints only in either host or guest, and using software breakpoints everywhere else.

[...] guest DR6 from Exit Qualification for Debug Exceptions.

Thank you, that was helpful, and it does indeed cover dr6 register.

raphaelning commented 6 years ago

the only approach to debugging QEMU/HAXM, while debugging a virtual machine, is using hardware breakpoints only in either host or guest, and using software breakpoints everywhere else.

This sounds reasonable. How does GDB choose between inserting a hardware BP and inserting a software BP? If we can't rely on it to make the right decision in the hypothetical scenario, should we check host DR7 before restoring guest DR{0, 1, 2, 3}?

The only thing missing now is single-stepping, which for some reasons triggers a triple-fault. [...] As soon as I'm finished, I'll submit a PR for this feature.

Cool. I just skimmed through your code and spotted a couple of typos--not sure if they are actually related to the triple fault:

https://github.com/AlexAltea/haxm/blob/f2808d8fa83754d2305e2df199e564069b659f39/core/vcpu.c#L1377

https://github.com/AlexAltea/haxm/blob/f2808d8fa83754d2305e2df199e564069b659f39/core/vcpu.c#L1403

In both cases set_dr3() should be changed to set_dr6().

AlexAltea commented 6 years ago

How does GDB choose between inserting a hardware BP and inserting a software BP?

That's decided by the command entered by the user: break for software breakpoints, hbreak for hardware breakpoints.

Cool. I just skimmed through your code and spotted a couple of typos--not sure if they are actually related to the triple fault:

Thanks! Seems unrelated to the triple-fault, but worth fixing anyway. :-)

AlexAltea commented 5 years ago

This issue should be closed since #81 was merged.