CP-37034: move TSX handling logic out of xenopsd

edwintorok commented 3 years ago

There have been 2 microcode updates on Intel that changed the behaviour of the TSX instruction. Currently there is logic in xenopsd to query MSR_ARCH_CAPS, and look at TSX_CTRL and enable HLE/RTM in the migration-safety CPUID compatibility featureset. However the 2nd microcode update moved the location of that information, and it is not always present in that MSR anymore. Xen currently "fakes up" the TSX_CTRL bit for backwards compatibility reasons, but this just adds technical debt.

Reuse the already existing logic in Xen instead of doing this calculation in the toolstack which becomes outdated on microcode and Xen updates.

Query the HVM/PV Max policy from Xen and logically OR those feature bits into features_hvm and features_pv respectively.

On latest upstream Xen the Max policy tells us what features we can accept on an incoming migration: i.e. the features are implemented natively in hardware, emulated by Xen or a compatible workaround by hardware. As an example: the TSX instruction may be hidden in the default policy, i.e. we do not want guests to see it and use it on a fresh boot, but present in the Max policy. This is safe because when the host has a TSX instruction the microcode update would just make the TSX instruction always abort, which is compliant with the instruction's specification (but then it is not a very useful instruction anymore, and better hidden on boot).

On Xen 4.13.x there is no Max/default policy split yet, but xenctrl in the patchqueue can make the necessary adjustments for TSX in the short term while we work on updating to a newer version of Xen on Stockholm.

This requires a version of Xen that has the new xenctrl stubs.

Drops a lot of technical debt and prints various information about policies for debugging purposes (including diffing policies to make it obvious what the difference is between them).

The featureset can be decoded into human readable form by pasting it as argument to the xen-cpuid executable. The policy can be decoded or created by looking up the values in the Intel/AMD CPU manuals (see xenctrl.mli for details).

The policy is not currently stored, but when we get CPUID/MSR leveling we will store in in XAPI. Thus the string serialization/deserialization code is here in xenctrlext.ml instead of xenctrl.ml so that the format is under the toolstack's control.

The xenctrl side of these changes is available at: https://lore.kernel.org/xen-devel/5fdb7b4cdee69af8e2b9d77b56b1027a8799cf04.1624012999.git.edvin.torok@citrix.com/T/#u And: https://github.com/edwintorok/xen/compare/xenctrlstubs

Before merging this we need the Xen change present and we'll need to add a BuildRequires/Requires with specific version of xen into xenopsd.spec (hence the draftness of the PR)

/cc @andyhhp /cc @lindig

edwintorok commented 3 years ago

That commit message became too long, I'll try to split it up into 2-3 commits with shorter commit messages each.

edwintorok commented 3 years ago

Discussed with @andyhhp that we might want to hide the actual details of the CPUID leaves and MSRs and don't serialize it for now (xenctrl will have some functions for those but are not ready yet), I'll simplify the PR next week.

andyhhp commented 3 years ago

We've got two things we're trying to do here, and I think it would be helpful to properly split them. 1) Remove the bonfire bodge from xenopsd, and move it down into libxc. 2) Progress the bigger CPU Policy plans.

Most of the complexity here is trying to address 2), despite not everything being quite ready in libxc. I suggest holding off on this for now.

To address 1), all we need is the logic to get both the default and max featuresets from Xen, and offer the max (featureshost*) for the incoming migration stream.

This ought to be simpler for now, and we can sort out full CPU Policies all together when the surrounding infrastructure is in place.

edwintorok commented 3 years ago

Simplified, we don't use the policy code at all because it is not ready/reliable on 4.13.x yet. Instead we use the already existing featureset query code with just a new variant for max featuresets.

Here are the necessary xenctrl changes to make the CI work here, based on @andyhhp's patch: https://github.com/xapi-project/xenctrl/pull/2

edwintorok commented 3 years ago

I'll drop the 6.x compat in another PR: that requires modifying xapi+xapi-idl+xenopsd

andyhhp commented 3 years ago

I'll drop the 6.x compat in another PR: that requires modifying xapi+xapi-idl+xenopsd

Ok then.

edwintorok commented 3 years ago

I got the Max and boot policies reversed (get_max_featureset was correct but assigned to wrong field, we need to assign it to features_hvm), XenRT found this when VMs failed to migrate. I've added some comments now to make it more obvious: features_hvm_host is what we use for newly booted VMs, and features_hvm is what we use for migration checks, i.e. the Max policy.

edwintorok commented 3 years ago

This has now passed some internal BST testing + TSX specific testing on CascadeLake/Skylake hardware (Kabylake pending).

xapi-project / xenopsd

CP-37034: move TSX handling logic out of xenopsd #758