llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.03k stars 11.58k forks source link

Undefined behavior in `CastInfo::castFailed` with `From=<MLIR interface>` #86647

Open ingomueller-net opened 5 months ago

ingomueller-net commented 5 months ago

I observed a TypeSwitch taking the "wrong" case in the snippet below:

llvm::TypeSwitch<ExpressionOpInterface, int>(op)
    .Case<FieldReferenceOp, LiteralOp>([&](auto op) {
      llvm::errs() << __PRETTY_FUNCTION__ << "\n";
      op.dump();
      return 42;
    })

which produced the following output during CI on Github:

int (anonymous namespace)::exportOperation(mlir::substrait::FieldReferenceOp)
%2 = substrait.literal -1 : si1

The line with __PRETTY_FUNCTION__ shows that we are in the FieldReferenceOp case but the op printed is actually a literal/LiteralOp, so the TypeSwitch entered the wrong case.

Curiously, this works fine on my local machine (using Clang 14, 15, and 17) but fails consistently on CI (using Clang 14, 15, and 17).

After a lot of printf debugging, I think that I found what the problem is: Eventually, CastInfo::castFailed is called when we try a case that doesn't correspond to the runtime type of the argument. This is what that function looks like:

static inline CastReturnType castFailed() {
  return CastReturnType(nullptr);
}

The problem is that CastReturnType at this point is const ExpressionOpInterface &. Notice the &! So we seem to create a reference to nullptr and then return that. Indeed, if I compile with ASan and UBsan, I get this in CI:

==21930==ERROR: AddressSanitizer: stack-use-after-return on address 0x7fef0de7fb20 at pc 0x7fef2d7c165c bp 0x7fffb4ebd190 sp 0x7fffb4ebd188
READ of size 8 at 0x7fef0de7fb20 thread T0
    #0 0x7fef2d7c165b in llvm::TypeSwitch<mlir::substrait::ExpressionOpInterface, mlir::FailureOr<std::unique_ptr<substrait::proto::Expression, std::default_delete<substrait::proto::Expression>>>>& llvm::TypeSwitch<mlir::substrait::ExpressionOpInterface, mlir::FailureOr<std::unique_ptr<substrait::proto::Expression, std::default_delete<substrait::proto::Expression>>>>::Case<mlir::substrait::FieldReferenceOp, (anonymous namespace)::exportOperation(mlir::substrait::ExpressionOpInterface)::$_0&>((anonymous namespace)::exportOperation(mlir::substrait::ExpressionOpInterface)::$_0&) /home/runner/work/iree-llvm-sandbox/iree-llvm-sandbox/sandbox/third_party/llvm-project/llvm/include/llvm/ADT/TypeSwitch.h:151:[26](https://github.com/iree-org/iree-llvm-sandbox/actions/runs/8391967477/job/22983497415#step:13:27)
...

Curiously, again, the sanitizers don't complain on my local machine.

I am not 100% sure whether my understanding of the code above returning a local reference is actually correct, but in the following, less templated code, I do get a warning:

using CastReturnType = const ExpressionOpInterface &;
auto castFailed = []() -> CastReturnType {
  return CastReturnType(nullptr);
//                      ^~~~~~~
// warning: returning reference to local temporary object [-Wreturn-stack-address]
};

As a simple work-around, I changed the original code to use TypeSwitch<Operation *, int> instead of TypeSwitch<ExpressionOpInterface, int>, for which CastReturnType is a pointer type that is value constructed from nullptr, which works fine.

However, I find the current behavior extremely dangerous. If using TypeSwitch<ExpressionOpInterface, int> isn't expected to work, I think there should be a mechanism that prevents compilation with those arguments.

joker-eph commented 5 months ago

It's interesting...

void foo(RegionBranchOpInterface branchOp) {
    auto mOp = dyn_cast<ModuleOp>(branchOp);
}

This triggers an error:

llvm/include/llvm/Support/Casting.h:490:54: error: functional-style cast from rvalue to reference type 'llvm::CastInfo<mlir::ModuleOp, mlir::RegionBranchOpInterface>::CastReturnType' (aka 'mlir::ModuleOp &')
  static inline CastReturnType castFailed() { return CastReturnType(nullptr); }
                                                     ^~~~~~~~~~~~~~~~~~~~~~
llvm/include/llvm/Support/Casting.h:494:14: note: in instantiation of member function 'llvm::CastInfo<mlir::ModuleOp, mlir::RegionBranchOpInterface>::castFailed' requested here
      return castFailed();
             ^
llvm/include/llvm/Support/Casting.h:657:30: note: in instantiation of member function 'llvm::CastInfo<mlir::ModuleOp, mlir::RegionBranchOpInterface>::doCastIfPossible' requested here
  return CastInfo<To, From>::doCastIfPossible(Val);
                             ^

But:

void bar(RegionBranchOpInterface branchOp) {
    int res = TypeSwitch<RegionBranchOpInterface, int>(branchOp)
      .Case([] (ModuleOp op) { return 42; });
}

Triggers only a warning:

llvm/include/llvm/Support/Casting.h:490:69: warning: returning reference to local temporary object [-Wreturn-stack-address]
  static inline CastReturnType castFailed() { return CastReturnType(nullptr); }
                                                                    ^~~~~~~
llvm/include/llvm/Support/Casting.h:494:14: note: in instantiation of member function 'llvm::CastInfo<mlir::ModuleOp, const mlir::RegionBranchOpInterface>::castFailed' requested here
      return castFailed();
             ^
llvm/include/llvm/Support/Casting.h:651:36: note: in instantiation of member function 'llvm::CastInfo<mlir::ModuleOp, const mlir::RegionBranchOpInterface>::doCastIfPossible' requested here
  return CastInfo<To, const From>::doCastIfPossible(Val);
                                   ^
llvm/include/llvm/ADT/TypeSwitch.h:60:29: note: in instantiation of function template specialization 'llvm::TypeSwitch<mlir::RegionBranchOpInterface, int>::Case<mlir::ModuleOp, (lambda at mlir/unittests/Interfaces/ControlFlowInterfacesTest.cpp:119:13)>' requested here
    return derived.template Case<CaseT>(std::forward<CallableT>(caseFn));
joker-eph commented 5 months ago

I think the constness makes the difference here:

// Calculate what type the 'cast' function should return, based on a requested
// type of To and a source type of From.
template <class To, class From> struct cast_retty_impl {
  using ret_type = To &; // Normal case, return Ty&
};
template <class To, class From> struct cast_retty_impl<To, const From> {
  using ret_type = const To &; // Normal case, return Ty&
};
lipracer commented 5 months ago

I found a very clear annotation here. Can we specialize the CastInfo type based on the OpInterface type, as OpInterface can be constructed from nullptr.I think I can try fixing it.

lipracer commented 5 months ago

After I specialized the CastInfo class, it can support the conversion from OpInterface to OpInterface. However, whether it is gcc-9 or clang-13, it will also give me a warning message returning reference to local temporary object. However, when I actually run tests, It will not execute it at this location.I think one of them may be that the compiler has not yet found a specialized implementation.

joker-eph commented 5 months ago

However, when I actually run tests, It will not execute it at this location

How do you figure this out?