[CodeGen][X86] Improving robustness of PATCHABLE_OP wrapper instruction

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

http://llvm.org

Other

28.04k stars 11.58k forks source link

[CodeGen][X86] Improving robustness of PATCHABLE_OP wrapper instruction #59039

Closed sylvain-audi closed 8 months ago

sylvain-audi commented 1 year ago

When activating -fms-hotpatch in clang, the "patchable-function" pass replaces the first machine instruction with a wrapper PATCHABLE_OP.

This wrapping loses some information about the wrapped instruction: when a PATCHABLE_OP instruction is handled by X86AsmPrinter::emitInstruction, the wrapped instruction simply gets lowered without going through X86AsmPrinter::emitInstruction itself.

Here is an example in C/C++, where a tail call doesn't get lowered properly: https://godbolt.org/z/1Pjcbx87n

The source of the issue seems to be the loss of information in PatchableFunction::runOnMachineFunction when replacing a MachineInstr with the PATCHABLE_OP one: It only keeps the OpCode and operands of the wrapped instruction, and X86AsmPrinter::emitInstruction can't be called from it.

We are looking for a clean way to achieve this. Any suggestions?

llvmbot commented 1 year ago

@llvm/issue-subscribers-backend-x86

tru commented 1 year ago

Turns out that this is not only windows related. This happens for all X86 platforms when -fms-hotpatch is being used.

tru commented 1 year ago

Ping @aganea

sylvain-audi commented 1 year ago

Ping!

tru commented 1 year ago

Maybe @phoebewang or @RKSimon can help out here since it's very much a complicated X86 backend issue.

phoebewang commented 1 year ago

I'm not familiar with hotpatch. Is the patch https://reviews.llvm.org/D137642 will solve the problem here too?

sylvain-audi commented 1 year ago

I'm not familiar with hotpatch. Is the patch https://reviews.llvm.org/D137642 will solve the problem here too?

That patch fixes the selection of the function's instruction to make patchable (guarantee it's at least 2 bytes long or is preceded by a 2-byte nop). This issue is about the lowering part, which I coudn't figure out how to do without losing information.

sylvain-audi commented 1 year ago

Ping!

I'm currently stuck, none of my attempts seem sustainable. Currently in our fork we have a hack that simply inserts a 2-byte nop at the beginning of the function, disabling the opcode modification. This results is a lot of useless nops.

Note that MSVC seems to do this differently, as in MSVC blog: "most of the time, the compiler can juggle things so that you don’t even notice that it arranged for the first instruction of a function to be a multi-byte instruction. ". But it seems that doing that in clang could be pretty involved.

phoebewang commented 1 year ago

cc @KanRobert

tru commented 1 year ago

Ping on this - anyone know if someone that can help us progress on this?

MolecularMatters commented 8 months ago

Pinging this once again since it was linked into an issue with tailcall optimizations today (https://github.com/llvm/llvm-project/issues/76958).

It would be great if somebody could look into this, since using -fms-hotpatch is a requirement for making clang-cl work with Live++. The list of companies relying on clang-cl & Live++ for stable C++ hot-reload is growing, so a fix for this underlying issue would be very welcome.

Since most developers/companies probably don't use the -fms-hotpatch switch in their shipping builds, but rather in their development, debug, and optimized builds, I suppose it would not be a problem if compiling with the -fms-hotpatch option does not generate the most optimal code.

I'd therefore like to propose the following compromise as a fix for this issue:

Run all code generation and optimization passes
For functions compiled with -fms-hotpatch, check whether the first instruction is at least 2 bytes long
If not, insert a simple 2-byte NOP (e.g. xchg ax, ax) as first instruction, leaving the rest of the compiled code as it is

This would only add an unnecessary 2-byte NOP for functions which are only one byte in size, which should be very, very few.

However, I'm not familiar with code generation and intermediate passes in LLVM, so the proposed solution might not make sense in the context of LLVM.

shafik commented 8 months ago

Also see: https://github.com/llvm/llvm-project/issues/76879

aganea commented 8 months ago

I've posted a fix in https://github.com/llvm/llvm-project/pull/77245