llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.97k stars 11.54k forks source link

[arm64ec] Exit thunks are not generated for libcall functions #101355

Open bylaws opened 1 month ago

bylaws commented 1 month ago

Compiling e.g.

#include <string.h>
extern void *b;
extern void *a;
int test() {
        memcpy(a, b, 0x10000);
}

will result in a call to the llvm.memcpy intrinsic that is then lowered to an actual memcpy call as part of ISel after AArch64Arm64ECCallLowering has ran, resulting in no exit thunk being emitted. This is similarly the case for all other libcalls and for many of them it's not possible to determine if they will be used at the time the EC call lowering runs. Emitting exit thunks for all possible libcalls would work but is obviously not ideal

CC: @cjacek @efriedma-quic

llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-aarch64

Author: Billy Laws (bylaws)

Compiling e.g. ``` #include <string.h> extern void *b; extern void *a; int test() { memcpy(a, b, 0x10000); } ``` will result in a call to the llvm.memcpy intrinsic that is then lowered to an actual memcpy call as part of ISel after AArch64Arm64ECCallLowering has ran, resulting in no exit thunk being emitted. This is similarly the case for all other libcalls and for many of them it's not possible to determine if they will be used at the time the EC call lowering runs. Emitting exit thunks for all possible libcalls would work but is obviously not ideal CC: @cjacek @efriedma-quic
bylaws commented 1 month ago

Thinking about it, perhaps always emitting exit thunks would be fine? libcalls must always be available to link to anyway I have something like:

diff --git a/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp b/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
index 310b152ef981..a377043ab8d1 100644
--- a/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
@@ -89,6 +89,7 @@ private:
   Module *M = nullptr;

   Type *PtrTy;
+  Type *I32Ty;
   Type *I64Ty;
   Type *VoidTy;

@@ -789,6 +790,7 @@ bool AArch64Arm64ECCallLowering::runOnModule(Module &Mod) {
     cfguard_module_flag = MD->getZExtValue();

   PtrTy = PointerType::getUnqual(M->getContext());
+  I32Ty = Type::getInt32Ty(M->getContext());
   I64Ty = Type::getInt64Ty(M->getContext());
   VoidTy = Type::getVoidTy(M->getContext());

@@ -840,6 +842,31 @@ bool AArch64Arm64ECCallLowering::runOnModule(Module &Mod) {
   }

   SetVector<GlobalValue *> DirectCalledFns;
+
+  if (!M->getFunction("memcpy")) {
+    Function *MemCpy =
+        Function::Create(FunctionType::get(PtrTy, {PtrTy, PtrTy, I64Ty}, false), GlobalValue::WeakODRLinkage, 0, "memcpy", M);
+    DirectCalledFns.insert(MemCpy);
+  }
+
+  if (!M->getFunction("memmove")) {
+    Function *MemCpy =
+        Function::Create(FunctionType::get(PtrTy, {PtrTy, PtrTy, I64Ty}, false), GlobalValue::WeakODRLinkage, 0, "memmove", M);
+    DirectCalledFns.insert(MemCpy);
+  }
+
+  if (!M->getFunction("memset")) {
+    Function *MemSet =
+        Function::Create(FunctionType::get(PtrTy, {PtrTy, I32Ty, I64Ty}, false), GlobalValue::WeakODRLinkage, 0, "memset", M);
+    DirectCalledFns.insert(MemSet);
+  }
+
   for (Function &F : Mod)
     if (!F.isDeclaration() &&
         F.getCallingConv() != CallingConv::ARM64EC_Thunk_Native &&
efriedma-quic commented 1 month ago

The list of possibly-relevant "libcalls" is, unfortunately, extremely long. memcpy/memmove/memset, yes, but also a bunch of compiler-rt utility methods, and a bunch of math routines. Listing them out would be a pain, and generating thunks for all of them would seriously bloat the code. To avoid that, we basically need to generate thunks after isel, which is tricky to implement. (Specifically, the interaction with the pass manager to run isel on the newly generated functions is hard.)

Long-term, probably LLVM is moving in the direction of reducing the number of libcalls generated by isel, in favor of doing stuff in pre-isel passes, but that doesn't really solve the issue at the moment.


I haven't found this to cause practical problems because normally you'd link arm64ec code against an arm64ec MSVCRT (which provides arm64-native implementations that can be directly called).

bylaws commented 1 month ago

I haven't found this to cause practical problems because normally you'd link arm64ec code against an arm64ec MSVCRT (which provides arm64-native implementations that can be directly called).

Yeah I don't this ends up affecting usecases outside of wine - but within wine many programs ship with their own ucrtbase and that is explicitly used over the system one due to differences in e.g. math functions. I found memcpy/memmove/memset to be 'enough' for these cases I have ran into though so perhaps that would be reasonable in the short term.