llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.13k stars 12.02k forks source link

Memory leak in llvm::orc when llvm::orc::DuplicateDefinition happens #60312

Closed Focshole closed 1 year ago

Focshole commented 1 year ago

Hello everyone, I had found a memory leak in llvm's error handling in ORCv2, llvm 15.0.7. The original code exits only if compiled as debug due to a llvm::orc::DuplicateDefinition error.

I had triggered the issue while doing (simplified the code):

llvm::orc::IRCompilerLayer lm;
llvm::orc::ResourceTrackerSP rt;
ThreadSafeModule TSM; 
...
...
  llvm::cantFail(lm.add(rt,std::move(TSM))); // this triggers a llvm::orc::DuplicateDefinition

Assume orc to be properly initialized when running that piece of code.

The issue won't happen when "fixing" the code in this way

llvm::orc::IRCompilerLayer lm;
llvm::orc::ResourceTrackerSP rt;
ThreadSafeModule TSM; 
...
...
  llvm::Error e = lm.add(rt,std::move(TSM));
  if (!e.isA<
          llvm::orc::
              DuplicateDefinition>()) // https://discourse.llvm.org/t/duplicate-symbols-in-separate-modules-no-longer-allowed/1805
    llvm::cantFail(std::move(e));
  else
    llvm::consumeError(std::move(e));

The relevant valgrind output is the following:

==41010==    at 0x4843003: operator new(unsigned long) (vg_replace_malloc.c:422)
==41010==    by 0xA9AC716: make_unique<llvm::orc::DuplicateDefinition, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (unique_ptr.h:1065)
==41010==    by 0xA9AC716: make_error<llvm::orc::DuplicateDefinition, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (Error.h:334)
==41010==    by 0xA9AC716: llvm::orc::JITDylib::defineImpl(llvm::orc::MaterializationUnit&) (Core.cpp:1694)
==41010==    by 0xAA181AC: operator() (Core.h:1825)
==41010==    by 0xAA181AC: runSessionLocked<llvm::orc::JITDylib::define<llvm::orc::BasicIRLayerMaterializationUnit>(std::unique_ptr<llvm::orc::BasicIRLayerMaterializationUnit, std::default_delete<llvm::orc::BasicIRLayerMaterializationUnit> >&&, llvm::orc::ResourceTrackerSP)::<lambda()> > (Core.h:1421)
==41010==    by 0xAA181AC: define<llvm::orc::BasicIRLayerMaterializationUnit> (Core.h:1838)
==41010==    by 0xAA181AC: llvm::orc::IRLayer::add(llvm::IntrusiveRefCntPtr<llvm::orc::ResourceTracker>, llvm::orc::ThreadSafeModule) (Layer.cpp:29)

Probably the issue is around here:

https://github.com/llvm/llvm-project/blob/8dfdcc7b7bf66834a761bd8de445840ef68e4d1a/llvm/include/llvm/ExecutionEngine/Orc/Core.h#L1822

Or here: https://github.com/llvm/llvm-project/blob/8dfdcc7b7bf66834a761bd8de445840ef68e4d1a/llvm/lib/ExecutionEngine/Orc/Core.cpp#L1691

But I have no clue so far.

llvmbot commented 1 year ago

@llvm/issue-subscribers-orcjit

lhames commented 1 year ago

HI @Focshole,

I don't think this is a bug. cantFail is effectively an assertion: "this value will always be success". Passing a failure value to cantFail will result in a call to llvm_unreachable, and there's no guarantee that the error (or anything else) will be freed.

You should only use cantFail if you are certain that the object that you are adding to the JIT will be added successfully.

In general you want to either propagate the Error or handle it:

if (auto Err = lm.add(rt,std::move(TSM)))
  return Err;

or

if (auto Err =
      handleErrors(
        lm.add(rt, std::move(TSM)),
        [](const DuplicateDefinition &DD) {
          // Handle duplicates.
        }))
  return Err; // Return any other errors to the caller.
Focshole commented 1 year ago

I don't think this is a bug. cantFail is effectively an assertion: "this value will always be success". Passing a failure value to cantFail will result in a call to llvm_unreachable, and there's no guarantee that the error (or anything else) will be freed.

Ouch,

/// Report a fatal error if Err is a failure value.
///
/// This function can be used to wrap calls to fallible functions ONLY when it
/// is known that the Error will always be a success value

You are right, I had used it improperly. I do not find a meaning in wrapping errors which are known to not happen, it looks like a fancy way to do a llvm::consumeError and ignore it except when running on debug, but it makes sense I guess. Thank you!