Paging issue on Haswell and pre-Haswell CPUs

VelocityRa commented 5 years ago

Describe the Bug

Summary: HAXM will not map a host virtual address to a guest physical address at or above a specific address, on Haswell or pre-Haswell CPUs (Haswell-E works).

Instead of a successful address translation, a page fault happens (triple fault in the case of the log below because of no interrupt handling) with CR2 containing the failed address.

Please see @StrikerX3's comment below for details.

Host Environment

HAXM version: v7.5.1
Host OS version: Windows 10 Pro version 1803
Host OS architecture: x86_64
Host CPU model: i5-4690K also confirmed not working on i7-2630QM, while it works on ie. i7-5930K
Host RAM size: 16GB

Guest environment

Tiny piece of code that mostly just boots to long mode, see below.

To Reproduce

Run the test here: https://github.com/StrikerX3/virt86-demos/tree/master/apps/x64-guest

Expected Behavior

Expectation: Test completes without "VCPU shutting down"/"VCPU execution failed" messages appearing.

Reproducibility

100%, the test with full source should help narrow down the problem

Diagnostic Information

HAXM log (debug level): https://cdn.discordapp.com/attachments/532915071697944576/592552716124028965/haxm.log

StrikerX3 commented 5 years ago

To be more exact, we can map GPAs successfully up to 0x7F'FFFF'F000 on all CPUs we tested (all three CPUs mentioned above and an i5-4460S). From 0x80'0000'0000 and up, only the Haswell-E CPU works.

This also happens with WHPX, which leads me to believe this is a limitation of the CPUs and not a HAXM bug. However, it would be nice if the IOCTLs that map GPA ranges returned an error in those cases.

Is there a way to programmatically determine the maximum usable GPA address on the current host?

StrikerX3 commented 5 years ago

Seems like we can use CPUID 8000_0008h.EAX[23..16] (or [7..0] if those bits are zero) to find out how many bits are supported in a GPA on the host's CPU.

wcwang commented 5 years ago

Thanks for your report. We will try to reproduce this issue according to the mentioned project. With the release of HAXM v7.5.1, we resolved an issue about the vcpu shutdown. And the patch has been merged into QEMU master. Could you help to check whether your QEMU version contains that patch, and provide the QEMU launch command with parameters for further analysis? If the arguments does not contain '-smp', the test case should also pass even without that patch.

VelocityRa commented 5 years ago

Hello, we're not using QEMU, the test above uses HAXM (or other HVMs) directly.

wcwang commented 5 years ago

Thanks for your reply. We will investigate the test case by leveraging the test project. Meanwhile, you are welcome to commit your patch if you have any idea, then we would like to review and discuss the issue further. Thanks.

hyuan3 commented 5 years ago

I can't access the url of haxm log. But I understand that gpa mapping in haxm (including set_ram and ept violation handling) will not report triple fault on failing cases.

VelocityRa commented 5 years ago

Reposting the log: haxm.log

The triple fault is because there is no IDT in the guest (it's empty).

Anyway, based on what @StrikerX3 said, the fix that would need to be implemented in HAXM is probably just a check that - based on the aforementioned host CPUID value - would return an error condition when you try to map GPAs higher than is supported via the HAXM IOCTL.

StrikerX3 commented 5 years ago

Exactly. To clarify: there is an upper limit to the address of usable GPAs that the CPU supports, which can be measured through the CPUID value mentioned previously. The limit is higher in Haswell-E CPUs compared to previous generation processors. In our tests, we were trying to map a GPA range above pre-Haswell-E CPUs' limit, but within Haswell-E's range. The issue is that the IOCTL went through without any kind of error on the older CPUs, leading us to believe that the memory was mapped, but when the guest attempted to execute code in that area, we got a page fault. We worked around this by detecting the maximum allowed GPA range based on CPUID and platform limitations and generating an error when an user attempts to map a GPA range beyond the limit.

I also noticed that HAXM imposes an upper limit of 2^31 pages to the GPA range to limit the size of the protection bitmap (as seen here). This limit is not enforced by other platforms such as WHPX or KVM. Additionally, there is an off-by-one error in the check that impedes usage of the highest page in the allowed range.

varinderpratap commented 2 years ago

Hello, Similar page fault issue we are facing in accessing ram with Tizen Emulator/Qemu on Haxm. 13:18:11.221|17380|T| yagl| 154|[10784/17380] {{{ yagl_transport_begin():154 13:18:11.221|21276|T| yagl| 974|[2878/2878] {{{ glTexSubImage2DData(target = 0xde1, level = 0, xoffset = 1, yoffset = 1, width = 720, height = 1280, format = 0x80e1, type = 0x1401, pixels = 00000000792af148):974 13:18:11.221|17380|E| yagl| 174|[0/0] yagl_transport_begin:174 - yagl_transport_begin - batch_size=6560, out_arrays_size=3683524, fence_seq=0, num_out_da=1 13:18:11.221|17380|T| yagl| 57|[2878/2878] {{{ yagl_mem_get(va = 0x00000000ab2f7000, len = 3683524):57 13:18:11.221|17380|W| yagl| 62|[2878/2878] yagl_mem_get:62 - page fault at 0x00000000ab2f7000, len= 3683524 13:18:11.221|17380|T| yagl| 65|[2878/2878] }}} yagl_mem_get:65

Qemu Source : https://review.tizen.org/gerrit/gitweb?p=sdk%2Femulator%2Fqemu.git;a=shortlog;h=refs%2Fheads%2Ftizen_qemu_5.0.1

Error Line: yagl| 62|[2878/2878] yagl_mem_get:62 - page fault at 0x00000000ab2f7000, len= 3683524 Happens mainly when mem length is greater than 2MB.

Yagl : Yet another graphics library. Yagl is vPCI device to perform OpenGL operations. https://review.tizen.org/gerrit/gitweb?p=sdk/emulator/qemu.git;a=blob;f=hw/yagl/yagl_device.c;h=2d52864a4bb71f56f17caba3b257823330cad8d4;hb=069e0b790db3c2d2c80148f3baead4be613bf8f9

Please note same code works for WHPX and KVM and same issue on MacOS Haxm.

May I know any pointer or approach to fix the same? Thanks!

intel / haxm

Paging issue on Haswell and pre-Haswell CPUs #218