llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.13k stars 12.02k forks source link

Duplicate definition of symbol error to do with mergable comdat constant symbols on Windows/COFF #39421

Closed mvhooren closed 2 years ago

mvhooren commented 5 years ago
Bugzilla Link 40074
Resolution FIXED
Resolved on Jan 22, 2020 15:03
Version trunk
OS Windows NT
CC @AlexDenisov,@lhames,@vchuravy

Extended Description

It happens when the same literal is defined twice, each in a different module and after the second module has been added to the RTDyldObjectLinkingLayer.

So for example in some module:

define float @​someFunction() { entry: ret float 2.000000e+00 }

In some other module: define float @​someOtherFunction() { entry: ret float 2.000000e+00 }

When materializing the second function, you will get this error:

JIT session error: Duplicate definition of symbol '__real@40000000'

The symbol names come from comdat symbols that are created in TargetLoweringObjectFileImpl.cpp TargetLoweringObjectFileCOFF::getSectionForConstant

For COFF, I have the OverrideObjectFlagsWithResponsibilityFlags and AutoClaimResponsibilityForObjectSymbols set to true on the RTDyldObjectLinkingLayer or my function symbols are not found at all.

The same code compiles and runs just fine on Linux/Elf, even with the COFF workarounds enabled there as well.

A workaround I found is to set HasCOFFComdatConstants to false in MCAsmInfoCOFF.cpp

lhames commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#44700

lhames commented 4 years ago

Great! Thank you for checking this out Machiel.

Closing as fixed by 84217ad6611.

mvhooren commented 4 years ago

I must add that I do not use LLJIT but my own JIT implementation using ORC.

mvhooren commented 4 years ago

Hi Lang,

I've tested my repro case on Windows with both debug and release builds without issue, using the unmodified current master branch.

I have also tested the IR posted in http://bugs.#44337 by loading the IR from a file, adding the module to a dylib, then getting the address to the 'calculate' function and executing it. This also works without problems, although there is an access violation when executing the compiled IR due to the usage of @​cachedValue. (store double %4, double* @​cachedValue, align 8). Removing that line will produce valid output.

Machiel

lhames commented 4 years ago

Hi Machiel,

Can you confirm that 84217ad66115cc31b184374a03c8333e4578996f fixes this bug?

It looks like this might be related to http://bugs.#44337 -- I want to make sure I figure out what has been fixed and what's still broken.

-- Lang.

mvhooren commented 4 years ago

Setting priority back down. This isn’t ideal, but it’s not a release blocker: ORCv1 and MCJIT are still both in LLVM10.

....

Machiel — Could you see if this patch fixes your issue? Unfortunately I don’t have a windows machine to test execution of COFF objects.

— Lang.

Roger on it not being a release blocker. I'm not really familiar with the criteria for what should warrant a release blocker, sorry about that.

I have tested your patch and everything seems to work fine so great success! I've also verified that the issue still exists in unpatched ORC.

lhames commented 4 years ago

Setting priority back down. This isn’t ideal, but it’s not a release blocker: ORCv1 and MCJIT are still both in LLVM10.

Confirming the earlier analysis: __real@40000000 is (a COMDAT “any” symbol) for a constant pool entry that is created by MC. We’re failing when two materialization units create the same symbol late in the pipeline and both try to register it.

An ideal solution to this would involve teaching the JIT, libObject, and the JIT-linker about COMDAT symbols. That’s not going to happen for a while though.

In the mean time I have attached a patch for an possible workaround. This patch modifies defineMaterializing to support defining new weak symbols (we’ll want to take this part either way), then adds a hack to RTDyldObjectLinkingLayer to detect COFF objects, look for newly defined symbols, and then mark any newly defined symbols in COMDAT sections as weak.

Machiel — Could you see if this patch fixes your issue? Unfortunately I don’t have a windows machine to test execution of COFF objects.

— Lang.

lhames commented 4 years ago

Work around duplicate defs from COFF constant pool entries.

mvhooren commented 4 years ago

I have marked this bug as a release blocker because it causes a crash when using ORC on Windows in a very trivial case.

mvhooren commented 5 years ago

Hi Lang,

Yes, on ORCv2 it prints the same error message to the console and then crashes shortly afterwards. My target tripple is x86_64-pc-windows-msvc.

The crash happens in RTDyldObjectLinkingLayer::onObjEmit when it tries to destroy a MemoryBuffer (the ObjBuffer).

Let me know if you need anything else. I could create a minimal repro case using the Kaleidoscope examples if you want.

lhames commented 5 years ago

Hi Machiel,

Oh -- this is awful. :)

Does it manifest with the same error on ORCv2? Or with a different error?

The relevant parts of RTDyldObjectLinkingLayer/RuntimeDyld's algorithm look something like this:

(1) scan the object file, noting weak definitions in a side table (2) find out which weak definitions we are responsible for (3) perform the link (4) if auto-claim is on, add all provided definitions to the definition set (5) register definitions

The issue likely has something to do with the fact that determine responsibility for materializing weak defs in (2), but don't claim it until (4). Though I would have expected this to lead to a missing definition error, rather than a duplicate.

We probably need to add a new API to MaterializationResponsibility: "defineMaterializingWeak". Whereas "defineMaterializing" generates an error on duplicates, "defineMaterializingWeak" would silently continue without adding duplicates to the symbol table.

Could you include the target triple that you're seeing in your test cases so that I can make sure I match your setup?

mvhooren commented 5 years ago

This bug is still reproducible with ORCv2