intel / haxm

Intel® Hardware Accelerated Execution Manager (Intel® HAXM)
BSD 3-Clause "New" or "Revised" License
3.23k stars 877 forks source link

PAT is disabled #59

Open AlexAltea opened 6 years ago

AlexAltea commented 6 years ago

My guest kernel requires PAT to be enabled, otherwise it will panic. The function handle_cpuid_virtual explicitly says that PAT is disabled [1], but I've seen PAT-related code in haxlib [2].

My question is: What's the current progress regarding PAT support? What needs to be done? I'd gladly work on it myself if necessary.

raphaelning commented 6 years ago

I checked all the PAT-related code snippets in [2]. It seems to me the only meaningful one is this:

https://github.com/intel/haxm/blob/a405d8ccaf3204cceb7639d710818cd0a84e50a2/core/vcpu.c#L564

This hardcoded value will be returned when the guest reads IA32_PAT MSR. The original author did not explain why they picked that initial value (the TODO comment doesn't help), but since HAXM does not claim support for PAT, the guest won't access the MSR anyway.

So I wouldn't worry about the existing code at all. If you could rewrite it, that would be great! I'm not familiar with PAT yet, but I'll read the relevant sections in Intel SDM when I review your pull request :-)

raphaelning commented 6 years ago

A few findings after I read Intel SDM:

  1. The initial value for vcpu->cr_pat is reasonable. In fact, the hardware uses the same value to reset the IA32_PAT MSR at power up (see SDM Vol. 3A 11.12.4, Table 11-12).
  2. All 64-bit Intel CPUs are supposed to support PAT (see NOTE in SDM Vol. 3A 4.9.1). Therefore, 64-bit guests should expect HAXM to advertise the PAT feature flag in CPUID. Although Linux x86_64 guests don't seem to care about the missing flag, apparently your kernel does--strictly speaking, it has exposed a HAXM bug ;-)
  3. From what I've read so far, the PAT feature is mostly transparent to the hypervisor. Since the existing code has already enabled the guest to read and write the IA32_PAT MSR, probably you could just try to enable feat_pat in handle_cpuid_virtual(), and see if things just work?
raphaelning commented 6 years ago

It turns out that simply adding advertising the PAT feature flag isn't enough to convince Linux guests that PAT is supported. Recent Linux kernels won't try to use PAT when MTRR support is broken:

[    0.000000] MTRR: Disabled
[    0.000000] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[    0.000000] CPU MTRRs all blank - virtualized system.
[    0.000000] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC

Probably we have to address the following TODOs in the existing code and implement MTRR virtualization properly:

https://github.com/intel/haxm/blob/557d5f55334b9d8f92b91eb4383f0341fe98ee34/core/vcpu.c#L563

https://github.com/intel/haxm/blob/557d5f55334b9d8f92b91eb4383f0341fe98ee34/core/vcpu.c#L3085

AlexAltea commented 6 years ago

Thank you so much for your analysis, and sorry for the late reply: couldn't login to GitHub while travelling.

After enabling feat_pat the panic was fixed, as the guest was just checking for the corresponding CPUID flag, but probably we will encounter MTRR-related issues later on (I don't know for sure because right now we have yet another issue with the kernel relying on the unimplemented MSR IA32_TSC_AUX).

Depending on how my guest deals PAT/MTRR later on, I might submit a PR to address this issue.

raphaelning commented 6 years ago

Sounds good, thanks!

raphaelning commented 6 years ago

When enabling PAT, we should also clearly separate the guest and host copies of the IA32_PAT MSR. It seems we can take advantage of a few VMCS features: