llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.9k stars 11.51k forks source link

[MLIR] WRONG code: mlir-opt does not add SExt/ZExt attributes on call parameters #51240

Open JonPsson opened 2 years ago

JonPsson commented 2 years ago
Bugzilla Link 51898
Version unspecified
OS Linux
CC @JonPsson,@River707,@uweigand,@ftynse

Extended Description

On SystemZ both mlir-cpu-runner/async-error.mlir and mlir-cpu-runner/async-value.mlir fail with "terminate called after throwing an instance of 'std::bad_alloc'". This seems to be because the int32_t outgoing argument is not given the SExt attribute when lowered to LLVM I/R as is expected by the SystemZ backend.

In LLVM IR, the parameters have attributes and for instance addParamAttr() method can be used in order for the backend to later perform the right extension (if needed per the target ABI). I however could not find a similar functionality in MLIR...

I tried to add the 'Signed' flag to the int32_t argument when the FunctionType was built for the called function:

--- a/mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp +++ b/mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp @@ -76,7 +76,7 @@ struct AsyncAPI {

static FunctionType addOrDropRefFunctionType(MLIRContext *ctx) { auto ref = opaquePointerType(ctx);

I did this after looking in mlir/include/mlir/ExecutionEngine/AsyncRuntime.h, where I see:

// Adds references to reference counted runtime object. extern "C" void mlirAsyncRuntimeAddRef(RefCountedObjPtr, int32_t);

// Drops references from reference counted runtime object. extern "C" void mlirAsyncRuntimeDropRef(RefCountedObjPtr, int32_t);

Then I however got an error from the verifier: "error: 'std.call' op operand type mismatch: expected operand type 'si32', but provided 'i32' for operand number 1". Unfortunately I could not figure out where the LLVM IR call instruction is built - this seemed to happen in some kind of pattern rewriter machinery which made this difficult for me to track down.

Therefore I would at this point like to ask for help and advice on how to proceed with this as this is a serious bug on any target that expects those attributes...

Any help much appreciated!

uweigand commented 2 years ago

I'd expect https://github.com/llvm/llvm-project/commit/ 92db09cde04933d2e703a590054fe709dcfa88c0 to work around this problem

Yes, this did fix the failure. Thanks!

Note that this was the last failure we were seeing in the mlir test suite on s390x, and now that this is gone, I've committed the patch to enable an mlir build bot on s390x: https://reviews.llvm.org/D109745 Once the buildbot server resets, we should hopefully see a green run show up.

llvmbot commented 2 years ago

I'd expect https://github.com/llvm/llvm-project/commit/92db09cde04933d2e703a590054fe709dcfa88c0 to work around this problem

uweigand commented 2 years ago

Will changing int32_t to int64_t help here and hide this problem for now?

Yes, that would help.

llvmbot commented 2 years ago

Will changing int32_t to int64_t help here and hide this problem for now?

uweigand commented 2 years ago

This was my understanding, thanks for confirming! Fixing this won't be trivial: we kind of knew that our FFI was fairly limited, we just thought we'd get around it with avoiding struct and restricting ourselves to fixed size integer: we didn't know about SystemZ though :)

Note that the changes to MLIR code don't (necessarily) have to be target-specific, so this may make it simpler.

The front end really only has to indicate whether the type is signed or unsigned, so basically you add a "signext" marker for all signed integer types and a "zeroext" marker for all unsigned integer types. Targets where the ABI does not require extension will simply ignore the marker.

joker-eph commented 2 years ago

This was my understanding, thanks for confirming! Fixing this won't be trivial: we kind of knew that our FFI was fairly limited, we just thought we'd get around it with avoiding struct and restricting ourselves to fixed size integer: we didn't know about SystemZ though :)

uweigand commented 2 years ago

Does it apply only to C/C++ FFI? For example: would an IR outliner at the LLVM IR level be allowed to extract code operating on i32 in a new function without annotations?

Well, it applies whenever you want/need to comply with the platform ABI, which is normally the case whenever one side of the call may be generated by another compiler.

If you emit a purely local function call, where the callee and all callers are completely under your control, you can of course do whatever you want ...

joker-eph commented 2 years ago

Does it apply only to C/C++ FFI? For example: would an IR outliner at the LLVM IR level be allowed to extract code operating on i32 in a new function without annotations?

uweigand commented 2 years ago

Have you tried the opposite? I mean changing the runtime function to take uin32_t instead?

To elaborate on what Jonas said, in the SystemZ ABI all integers must always be extended to full register width (64-bit), either signed or unsigned depending on the type.

So the correct LLVM IR mapping of a C "int32_t" parameter would be "signext i32", while the correct mapping of "uint32_t" would be "zeroext i32".

A plain "i32" is nearly never correct (it is only used when passing a four-byte struct in a register).

(Note that since the back-end does not know whether the type originally was signed or unsigned, it has to rely on the front-end to provide the appropriate signext/zeroext attribute.)

JonPsson commented 2 years ago

Have you tried the opposite? I mean changing the runtime function to take uin32_t instead?

No, I had not, but per your suggestion I tried it but with the same result. So it seems the MLIR -> LLVM stage does not emit the function declarations properly. It is missing the attribute on the i32 argument:

declare void @​mlirAsyncRuntimeDropRef(i8*, i32)

We have had this problem before -- see e.g. https://reviews.llvm.org/D90760. I suspect that this may not only be an async problem but perhaps missing in more places in mlir..?

lattner commented 2 years ago

I'm not familiar with the async runtime stuff, Eugene is the expert here I think.

joker-eph commented 2 years ago

Have you tried the opposite? I mean changing the runtime function to take uin32_t instead?

JonPsson commented 2 years ago

assigned to @joker-eph