mstorsjo / llvm-mingw

An LLVM/Clang/LLD based mingw-w64 toolchain
Other
1.75k stars 176 forks source link

[C++20] [Modules] ld.lld: error: undefined symbol: initializer for module #421

Open huangqinjin opened 3 months ago

huangqinjin commented 3 months ago

Steps to Reproduce

hello.cppm

module;

#if X == 1
#define EXPORT __declspec(dllexport)
#elif X == 2
#define EXPORT __attribute__((visibility("default")))
#elif X == 3
#define EXPORT
#endif

export module hello;

export EXPORT void f() {}

main.cpp

import hello;

int main()
{
    f();
}

commands

clang++ -std=c++20 -fmodule-output -shared hello.cppm -o hello.dll -DX=1
clang++ -std=c++20 -fprebuilt-module-path=. main.cpp -L. -lhello

-DX=1 results in ld.lld error, -DX=2 and -DX=3 have no error. -DX=1 with windows version of Clang also has no error.

mstorsjo commented 2 months ago

With "Windows version of Clang", I presume you mean Clang running in MSVC mode (not a windows build of e.g. llvm-mingw).

If we split up the steps so we get intermediate object files we can inspect, like this:

clang++ -std=c++20 -fmodule-output hello.cppm -c -o hello.o -DX=1
clang++ -shared hello.o -o hello.dll
clang++ -std=c++20 -fprebuilt-module-path=. main.cpp -c -o main.o
clang++ main.o -L. -lhello -o main.exe

We see this:

$ llvm-nm hello.o 
00000000 a @feat.00
00000010 T _ZGIW5hello
00000000 T _ZW5hello1fv
$ llvm-readobj --coff-directives hello.o

File: hello.o
Format: COFF-x86-64
Arch: x86_64
AddressSize: 64bit
Directive(s):  -export:_ZW5hello1fv
$ llvm-nm main.o 
00000000 a @feat.00
00000020 t _GLOBAL__sub_I_main.cpp
         U _ZGIW5hello
         U _ZW5hello1fv
         U __main
00000000 T main

While if we do the same with Clang in MSVC mode, we see the following:

$ llvm-nm hello.o 
00000000 T ?f@@YAXXZ
00000000 a @feat.00
$ llvm-readobj --coff-directives hello.o

File: hello.o
Format: COFF-x86-64
Arch: x86_64
AddressSize: 64bit
Directive(s):  /EXPORT:"?f@@YAXXZ"

$ llvm-nm main.o 
         U ?f@@YAXXZ
00000000 a @feat.00
00000000 T main

So in MSVC mode, the symbol reference to f doesn't seem to include the module name hello, and there's no reference to something equal to _ZGIW5hello (initializer for module hello). I'm not sure if this is a deficiency in the module support for MSVC mode, or just what that is...

Anyway, to properly solve this issue for mingw mode... When we have a dllexport directive in any embedded file, we only export those symbols, while normally we'd export all symbols. We can add -Wl,--export-all-symbols to the command for linking hello.dll, but that's not a very pretty fix. Other than that, I'm not quite sure where to begin to fix this (I'm not familiar enough with C++ modules yet). When we compile hello.cppm into hello.o, and we see a dllexport attribute on one symbol within the hello module, does that mean that we should implicitly mark the initializer for module hello, _ZGIW5hello, as dllexport as well?

huangqinjin commented 2 months ago

in MSVC mode, the symbol reference to f doesn't seem to include the module name hello

This should be a deficiency when clang targeting MSVC ABI. The big three are all choose strong module ownership model. So module names are included in final mangled symbol names, i.e. _ZW5hello1fv for GNU, ?f@@YAXXZ::<!hello> for MSVC.

in MSVC mode, ... there's no reference to something equal to _ZGIW5hello (initializer for module hello)

Looks like it is specific to GNU ABI, MSVC ABI doesn't have such thing.

$ /opt/msvc/bin/x64/cl /std:c++20 /TP /interface /c hello.cppm /Fo:hello.cl.obj -DX=1
$ llvm-nm hello.cl.obj 
00000000 T ?f@@YAXXZ::<!hello>
010582ef a @comp.id
80010190 a @feat.00
00000002 a @vol.md

GCC also has the same thing, but it is not referenced in main.o, so gcc does not suffer from same issue.

$ x86_64-w64-mingw32-g++ -std=c++20 -fmodules-ts -c -x c++ hello.cppm -o hello.gcc.o -DX=1
$ llvm-nm hello.gcc.o
0000000000000007 T _ZGIW5hello
0000000000000000 T _ZW5hello1fv

$ x86_64-w64-mingw32-g++ -std=c++20 -fmodules-ts -c main.cpp -o main.gcc.o
$ llvm-nm main.gcc.o 
         U _ZW5hello1fv
         U __main
00000000 T main

does that mean that we should implicitly mark the initializer for module hello, _ZGIW5hello, as dllexport as well?

Or _ZGIW5hello is not necessary like GCC?

huangqinjin commented 2 months ago

in MSVC mode, the symbol reference to f doesn't seem to include the module name hello

This should be a deficiency when clang targeting MSVC ABI. The big three are all choose strong module ownership model. So module names are included in final mangled symbol names, i.e. _ZW5hello1fv for GNU, ?f@@YAXXZ::<!hello> for MSVC.

I opened a upstream issue https://github.com/llvm/llvm-project/issues/89781 regarding the symbol mangling.

mstorsjo commented 2 months ago

in MSVC mode, the symbol reference to f doesn't seem to include the module name hello

This should be a deficiency when clang targeting MSVC ABI. The big three are all choose strong module ownership model. So module names are included in final mangled symbol names, i.e. _ZW5hello1fv for GNU, ?f@@YAXXZ::<!hello> for MSVC.

I opened a upstream issue llvm/llvm-project#89781 regarding the symbol mangling.

Thanks!

GCC also has the same thing, but it is not referenced in main.o, so gcc does not suffer from same issue.

$ x86_64-w64-mingw32-g++ -std=c++20 -fmodules-ts -c -x c++ hello.cppm -o hello.gcc.o -DX=1
$ llvm-nm hello.gcc.o
0000000000000007 T _ZGIW5hello
0000000000000000 T _ZW5hello1fv

$ x86_64-w64-mingw32-g++ -std=c++20 -fmodules-ts -c main.cpp -o main.gcc.o
$ llvm-nm main.gcc.o 
         U _ZW5hello1fv
         U __main
00000000 T main

Hmm, won't this mean that there's an Itanium ABI mismatch between these two compilers? If I build one object file with Clang and one with GCC, we might either have an initializer for the module that isn't called, or we might have an undefined reference to the initializer.

I get that the actual binary compiled module (PCM) is compiler specific, but aren't object files supposed to be interoperable? Especially, as in the case of the original bug report here, the issue is across a DLL interface, which definitely should be compileable with two different compilers?

huangqinjin commented 2 months ago

won't this mean that there's an Itanium ABI mismatch between these two compilers?

Yes, it is a ABI issue. But I have no idea whether Clang or GCC is correct. Or maybe Itanium ABI is missing specification regarding initializer for module?

mstorsjo commented 2 months ago

won't this mean that there's an Itanium ABI mismatch between these two compilers?

Yes, it is a ABI issue. But I have no idea whether Clang or GCC is correct. Or maybe Itanium ABI is missing specification regarding initializer for module?

I guess we can start out by filing a bug report on either LLVM's or GCC's bugzilla about it, or both, and whoever responds to it can maybe give guidance towards which side might have the bug, or they can suggest escalating it to https://github.com/itanium-cxx-abi/cxx-abi.

huangqinjin commented 2 months ago

Ok, there is a PR https://github.com/itanium-cxx-abi/cxx-abi/pull/144 for C++20 Modules. The related part is https://html-preview.github.io/?url=https://github.com/urnathan/cxx-abi/blob/main/abi.html#mangling-module-initializer.

So, GCC chose to omit call to module-initializer for this trivial example since there is no objects with dynamic-initializers.

BTW, GCC currently hides module-initializers if -fvisibility=hidden is specified, which is actually a bug reported here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105397. And Clang properly made them with DEFAULT visibility.

When we compile hello.cppm into hello.o, and we see a dllexport attribute on one symbol within the hello module, does that mean that we should implicitly mark the initializer for module hello, _ZGIW5hello, as dllexport as well?

After understanding the role of initializer for module, I think it should be dllexported from DLL anyway, even it is a trivial call, even there are no other symbols are dllexported.

mstorsjo commented 2 months ago

Ok, there is a PR itanium-cxx-abi/cxx-abi#144 for C++20 Modules. The related part is https://html-preview.github.io/?url=https://github.com/urnathan/cxx-abi/blob/main/abi.html#mangling-module-initializer.

Ok, so this bit I guess:

It is also permitted to omit calls to module-initializer functions known to have no dynamic initializations.

So, GCC chose to omit call to module-initializer for this trivial example since there is no objects with dynamic-initializers.

Indeed... But how does this spec mean to handle the case when one object decides to omit the module-initializer definition, while the other one decides to not omit making a call to it?

BTW, GCC currently hides module-initializers if -fvisibility=hidden is specified, which is actually a bug reported here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105397. And Clang properly made them with DEFAULT visibility.

Oh, interesting!

When we compile hello.cppm into hello.o, and we see a dllexport attribute on one symbol within the hello module, does that mean that we should implicitly mark the initializer for module hello, _ZGIW5hello, as dllexport as well?

After understanding the role of initializer for module, I think it should be dllexported from DLL anyway, even it is a trivial call, even there are no other symbols are dllexported.

Hmm. In the case of dllexporting, I wouldn't want to add any such custom logic in the linker (because it would need matching implementation in GNU ld etc, and the whole situation is quite nontrivial anyway), so I would prefer to stick to generating the relevant attributes where necessary.

For potential references of doing this, can we make a slightly less trivial example, where the module does have dynamic initialization, which should trigger MSVC to produce a similar module-initializer? Then we could check how MSVC handles dllexport in this part of object file interfaces?

huangqinjin commented 1 month ago

But how does this spec mean to handle the case when one object decides to omit the module-initializer definition, while the other one decides to not omit making a call to it?

The spec only allows to omit the call but not to omit the definition.

make a slightly less trivial example, where the module does have dynamic initialization, which should trigger MSVC to produce a similar module-initializer

Yes, here is the example https://github.com/huangqinjin/cxxmodules/tree/master/shared-lib. The initialization of object gploc is not trivial.

For MSVC, nothing new for modules, just use the CRT initialization machenism, and insert the initializer of object gploc into section .CRT$XCU.

> nmake -f Makefile.msvc clean hello.exe
> dumpbin /section:.CRT$XCU /RELOCATIONS hello.obj

SECTION HEADER #26
.CRT$XCU name
       0 physical address
       0 virtual address
       8 size of raw data
    1C17 file pointer to raw data (00001C17 to 00001C1E)
    1C1F file pointer to relocation table
       0 file pointer to line numbers
       1 number of relocations
       0 number of line numbers
40400040 flags
         Initialized Data
         8 byte align
         Read Only

RELOCATIONS #26
                                                Symbol    Symbol
 Offset    Type              Applied To         Index     Name
 --------  ----------------  -----------------  --------  ------
 00000000  ADDR64            00000000 00000000        3B  ??__Egploc@@YAXXZ (void __cdecl `dynamic initializer for 'gploc''(void))

For MinGW, I launch the example with WinDBG and put a breakpoint at the module-initializer hello!_ZGIW5hello.

First time hit in impl.cpp, it is a module implementation unit which implicitly imports the module.

 # Child-SP          RetAddr               Call Site
00 00000007`4ff1ed78 00007fff`c5fb16d9     hello!_ZGIW5hello
01 00000007`4ff1ed80 00007fff`c5fb273b     hello!_GLOBAL__sub_I_impl.cpp+0x9
02 00000007`4ff1edb0 00007fff`c5fb1254     hello!_main+0x6b
03 00000007`4ff1edf0 00007ff8`01f69a1d     hello!DllMainCRTStartup+0x94
04 00000007`4ff1ee60 00007ff8`01fbc2c7     ntdll!LdrpCallInitRoutine+0x61
05 00000007`4ff1eed0 00007ff8`01fbc05a     ntdll!LdrpInitializeNode+0x1d3
06 00000007`4ff1f020 00007ff8`01fbc0e0     ntdll!LdrpInitializeGraphRecurse+0x42
07 00000007`4ff1f060 00007ff8`02023c42     ntdll!LdrpInitializeGraphRecurse+0xc8
08 00000007`4ff1f0a0 00007ff8`01fc4dbb     ntdll!LdrpInitializeProcess+0x1f62
09 00000007`4ff1f4c0 00007ff8`01fc4c43     ntdll!LdrpInitialize+0x15f
0a 00000007`4ff1f560 00007ff8`01fc4bee     ntdll!LdrpInitialize+0x3b
0b 00000007`4ff1f590 00000000`00000000     ntdll!LdrInitializeThunk+0xe

Second time hit in hello.ixx, the module interface unit itself, where the module-initializer is defined.

 # Child-SP          RetAddr               Call Site
00 00000007`4ff1eda8 00007fff`c5fb273b     hello!_ZGIW5hello
01 00000007`4ff1edb0 00007fff`c5fb1254     hello!_main+0x6b
02 00000007`4ff1edf0 00007ff8`01f69a1d     hello!DllMainCRTStartup+0x94
03 00000007`4ff1ee60 00007ff8`01fbc2c7     ntdll!LdrpCallInitRoutine+0x61
04 00000007`4ff1eed0 00007ff8`01fbc05a     ntdll!LdrpInitializeNode+0x1d3
05 00000007`4ff1f020 00007ff8`01fbc0e0     ntdll!LdrpInitializeGraphRecurse+0x42
06 00000007`4ff1f060 00007ff8`02023c42     ntdll!LdrpInitializeGraphRecurse+0xc8
07 00000007`4ff1f0a0 00007ff8`01fc4dbb     ntdll!LdrpInitializeProcess+0x1f62
08 00000007`4ff1f4c0 00007ff8`01fc4c43     ntdll!LdrpInitialize+0x15f
09 00000007`4ff1f560 00007ff8`01fc4bee     ntdll!LdrpInitialize+0x3b
0a 00000007`4ff1f590 00000000`00000000     ntdll!LdrInitializeThunk+0xe

Third time hit in main.cpp.

 # Child-SP          RetAddr               Call Site
00 00000007`4ff1f928 00007ff7`bc8b1409     hello!_ZGIW5hello
01 00000007`4ff1f930 00007ff7`bc8b154b     hello_exe!_GLOBAL__sub_I_main.cpp+0x9
02 00000007`4ff1f960 00007ff7`bc8b12eb     hello_exe!_main+0x6b
03 00000007`4ff1f9a0 00007ff7`bc8b1366     hello_exe!WinMainCRTStartup+0x1ab
04 00000007`4ff1fa00 00007ff8`010c7344     hello_exe!mainCRTStartup+0x16
05 00000007`4ff1fa30 00007ff8`01fa26b1     KERNEL32!BaseThreadInitThunk+0x14
06 00000007`4ff1fa60 00000000`00000000     ntdll!RtlUserThreadStart+0x21

The first time call to the module-initialzer is always during DLL loading, otherwise if we dynamically load the DLL, those static objects would become unintialized. So that means, calls to the module-initialzer are unnecessary at all outside the DLL, these calls will never be the first.

The GCC visibility bug no longer matters if the call to the module-initialzer can be eliminated. Looks like this should be done in the linker? At compile stage the compiler doesn't know whether the module-initialzer comes from a DLL or not.

mstorsjo commented 1 month ago

But how does this spec mean to handle the case when one object decides to omit the module-initializer definition, while the other one decides to not omit making a call to it?

The spec only allows to omit the call but not to omit the definition.

Oh, I had missed the fact that GCC did emit the definition, but just skipped the call. Then this area is indeed completely clear, sorry for the confusion.

For MSVC, nothing new for modules, just use the CRT initialization machenism, and insert the initializer of object gploc into section .CRT$XCU.

Ok, so this is similar to the call in hello!_GLOBAL__sub_I_impl.cpp+0x9 as you demonstrated below (just hooked up differently) - it is called the same way as other static constructors are.

The first time call to the module-initialzer is always during DLL loading, otherwise if we dynamically load the DLL, those static objects would become unintialized. So that means, calls to the module-initialzer are unnecessary at all outside the DLL, these calls will never be the first.

What you say here does make sense. So what's the reason that the Itanium ABI added the extra module-initializer call? Or is the case that the module-initializer is called via regular static initializer an extra implementation bonus in Clang/GCC that Itanium ABI doesn't mandate?

The GCC visibility bug no longer matters if the call to the module-initialzer can be eliminated. Looks like this should be done in the linker? At compile stage the compiler doesn't know whether the module-initialzer comes from a DLL or not.

Yes, maybe... But ideally I would like to see some prior art in ELF land, that such module-initializer calls can be eliminated, before I'd venture out to implement that in a linker. (This would need to be implemented in lld/COFF, lld/ELF, ld.bfd, ld.gold, etc.) Because, as you say, that GCC issue can be fixed by linkers eliminating those calls (or more practically, redirecting those calls to a no-op function that just returns).

Other than that, I do agree with specifically one comment in the GCC bug report, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105397#c1:

Perhaps the best option is to default the visibility of the implicit functions to the widest visibility of any function or object in module purview exposed by the TU.

For Windows, this would be the same as my idea above - if anything within the module interface is marked dllexport, then the initializer should also be marked dllexport.

huangqinjin commented 1 month ago

what's the reason that the Itanium ABI added the extra module-initializer call?

I guess with module initializers, we can guarantee that imported objects have been initialized, so we can access them safely even in namespace scope.