microsoft / STL

MSVC's implementation of the C++ Standard Library.
Other
10.24k stars 1.51k forks source link

Use forwarders to avoid double indirection and have better performance #763

Open AlexGuteniev opened 4 years ago

AlexGuteniev commented 4 years ago

In module definitions file there's such syntax:

EXPORTS
   func2=other_module.func1

They add one-time complication for loader at DLL load time, but then they optimize away unnecessary indirect call: import address table entry of caller directly points to final implementation, rather than pointing to proxy module and having another indirect all.

Some trivial fast on fast path synchronization functions that are potentially on critical path could benefit https://github.com/microsoft/STL/blob/852a3085906108045a3951064baaf4952f596e9e/stl/src/cthread.cpp#L97-L99

https://github.com/microsoft/STL/blob/852a3085906108045a3951064baaf4952f596e9e/stl/src/sharedmutex.cpp#L20-L22

Even more with XP compatibility break, when InitOnce... stuff will be called statically.

Still open questions:

  1. Safety of this optimization. I'm a bit concerned about unloading module that is forwarder target. It will be not a problem for Kernel32.dll, which does not ever unload, likely to be no problem if module is imported normally as well, but generic case is still a concern, so need to figure out limitations
  2. Using .def files, does is cost start using def files everywhere (as a possible #613 resolution), or maybe mix def files and __declspec(dllexport)
  3. Need to be careful with signature and calling conventions. This way likely to be ABI breaking to tune types and calling conventions.
AlexGuteniev commented 4 years ago
  • Safety of this optimization. I'm a bit concerned about unloading module that is forwarder target. It will be not a problem for Kernel32.dll, which does not ever unload, likely to be no problem if module is imported normally as well, but generic case is still a concern, so need to figure out limitations

I think I recalled the details. It is an interesting case, when forwarder is used by GetProcAddress, and forwarder target is not yet loaded. Loader pins the target module until process exits, at least in some Windows versions, though it could unload it at proxy module unload time. This can be sorted out by STL not using GetProcAddress for its internal exports.

AlexGuteniev commented 4 years ago

Alternatively, forwarders can be implemented without .def files:

#include <iostream>
#include <Windows.h>

#pragma comment(linker, "/export:AlternativeName=Kernel32.GetCurrentThreadId")

int main()
{
    auto thd_id = reinterpret_cast<decltype(&::GetCurrentThreadId)>(
        ::GetProcAddress(::GetModuleHandle(nullptr), "AlternativeName"));

    std::cout << thd_id() << std::endl;
    std::cout << ::GetCurrentThreadId() << std::endl;
}
AlexGuteniev commented 4 years ago

Here is documentation what is a forwarder exactly, might be helpful: https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#export-address-table

AlexGuteniev commented 4 years ago

#pragma comment option looks better than .def, as it can be controlled by preprocessor, see https://github.com/microsoft/STL/issues/613#issuecomment-622028507

AlexGuteniev commented 4 years ago

@BillyONeal mentioned in https://github.com/microsoft/STL/pull/688#issuecomment-622259267 aliasobj tool; this may be a superior alternative to using forwarders.

StephanTLavavej commented 2 years ago

We talked about this at the weekly maintainers meeting. For new code, using forwarders seems desirable - both the /alternatename and ASM approaches are viable (with various interactions with ARM64 etc. - more investigation would be needed to find exactly which approach works on the most architectures with the least effort). For existing code, extreme caution would be needed to prevent a recurrence of the /alternatename fiasco that I caused (and which @barcharcraz found out how to fix with the ASM approach). The gains from converting existing code are probably small enough to not be worth the massive risk here (even if we used the ASM approach, emitting real function definitions, there may be other scenarios or gotchas that we haven't imagined).

As @CaseyCarter noted, for vNext we could definitely convert all existing wrappers to use forwarders.