Inefficient 64-bit absolute addresses in Cygwin

llvmbot commented 5 years ago


Bugzilla Link	42983
Version	5.0
OS	Windows NT
Reporter	LLVM Bugzilla Contributor
CC	@DougGregor,@efriedma-quic,@mstorsjo,@noloader,@zygoloid

Extended Description

Cygwin Clang v. 5.0.1 is using 64-bit absolute addresses when accessing static data in 64-bit mode. This is inefficient because it requires an extra 10-bytes long instruction for loading an address into a register every time it needs to access static data.

Linux Clang v. 6.0.0 with --target=x86_64-win64-windows gives the more efficient relative addresses, as do all other compilers.

Is this a Cygwin-only issue? I posted it to the cygwin mailing list, but they advised me to go here.

Test case:

#include <immintrin.h>

__m128d test (__m128d a) {
    __m128d b = _mm_add_pd(a, _mm_set1_pd(1.5));
    __m128d c = _mm_mul_pd(b, _mm_set1_pd(2.5));
    return c;
}

Cygwin Clang assembly output:

_Z4testDv2_d:
    vmovapd    (%rcx), %xmm0
    movabsq    $.LCPI0_0, %rax
    vaddpd    (%rax), %xmm0, %xmm0
    movabsq    $.LCPI0_1, %rax
    vmulpd    (%rax), %xmm0, %xmm0
    retq

Linux Clang assembly output with windows target:

"?test@@YAU__m128d@@U1@@Z":             # @"\01?test@@YAU__m128d@@U1@@Z"
# %bb.0:
    vmovapd (%rcx), %xmm0
    vaddpd  __xmm@3ff80000000000003ff8000000000000(%rip), %xmm0, %xmm0
    vmulpd  __xmm@40040000000000004004000000000000(%rip), %xmm0, %xmm0
    retq

2e1c2d01-a631-41c2-93fe-8d40b95d8607 commented 3 years ago

Don't assume DSO-local also for Cygwin I made a patch according to Eli's suggestion in Comment 8. It seems to work for me.

efriedma-quic commented 5 years ago

echo "extern int a; int* f() { return &a; }" | clang -x c - -o - -S --target=x86_64-pc-mingw64

[...]
movq    .refptr.a(%rip), %rax
[...]
.refptr.a:
        .quad   a

So it's using rip-relative addressing, but there's an extra level of indirection to allow it to point to an arbitrary address.

Make sure you're trying this with 8.0 or newer.

llvmbot commented 5 years ago

Thanks for the tip Eli, but I can't make it work.

When I try compiling with

clang --target=x86_64-pc-mingw64

on Linux clang or Cygwin clang, I get a 32-bit rip-relative address for external variables.

efriedma-quic commented 5 years ago

On Unix, clang currently doesn't really implement -mcmodel=medium correctly: it essentially is just pretending you specified large. That could be fixed, but it's not a high priority since almost nobody uses the medium code model, as far as I know.

Not exactly sure how that would apply to Windows, since the Unix medium doesn't really do anything useful in a direct sense, but we could do something similar.

Your comment reminded me of what we currently do for MinGW. Sorry, I should have remembered this earlier. For those targets, we actually use a small code model, but we have special handling for variables which we can't prove are DSO-local. You can try generating code with clang --target=x86_64-pc-mingw64 to see this. If we want cygwin to do the same thing, it's probably just a matter of going through the LLVM source code and replacing uses of isWindowsGNUEnvironment() with isOSCygMing().

If you want to avoid unnecessary performance overhead in this setup, you can use ThinLTO (https://clang.llvm.org/docs/ThinLTO.html).

llvmbot commented 5 years ago

After a long debate on the cygwin mailing list and some reverse engineering I have come to the following conclusions:

Cygwin is using the medium memory model on Gcc and Clang (in 64 bit mode).
Cygwin is adding an extra link process when an EXE or DLL is loaded, in order to emulate the behavior of Linux shared objects. They call this pseudo-relocation, and it is undocumented. This emulation of shared objects includes the possibility to link directly to a variable in a different DLL. This requires a 64-bit address table (an extra GOT?). The medium model is necessary for using 64-bit addresses on external variables.
Cygwin works fine - and more efficiently - with a small memory model on any program that does not make direct access to a variable in a different DLL. The cygwin people are unwilling to reap this performance benefit.
Direct access to a variable in another DLL is arguably bad programming practice by modern standards, but it may occur in old Linux code. Cygwin is able to compile such old code.
The memory models work differently in Gcc and Clang. Gcc is using 64 bit address tables for external variables when compiling with a medium or large memory model. Clang is using 64 bit address tables on local static variables as well when compiling with a medium or large memory model. This includes floating point constants, string constants, array initializer tables, jump tables, and more. This is quite wasteful if not needed.
The small memory model works differently for different targets. It allows 32 bit absolute addresses on a Linux target, but not on a Windows or Mac target.

Can you please help clarify if the difference between Gcc and Clang for local static variables on the medium/large memory model is intended? If it is unintended, then Clang can be improved. If it is intended - and a good reason is given - then we can close this issue.

efriedma-quic commented 5 years ago

Whoever built the cygwin package must have modified the source code; there is no build option to change the default code model.

Currently, there isn't anyone in the LLVM community maintaining builds for cygwin. The most recent post I saw on llvm-dev is that LLVM doesn't build on a cygwin host. If someone wants to reach out to the LLVM community for help stabilizing the build for cygwin hosts/targets, I'd be happy to provide guidance, though.

llvmbot commented 5 years ago

Thanks for your comments.

The exact meaning of the various -mcmodel flags varies by target. Yes, apparently so, but this is undocumented. The SysV ABI says nothing about Windows, of course. And the Windows ABI says nothing about PIC.

There is no PIC concept in Windows. The COFF file format allows all addressing modes: 32-bit absolute, 32-bit relative, 32-bit image-relative (does not exist in ELF), 64-bit absolute, and more.

Compiling for PIC in Linux involves that all addresses go through a GOT or PLT. These don't exist in Windows.

I think it is a bad idea to turn off -mcmodel=medium and large for Windows targets, because they may be useful for solving certain problems, and they definitely work.

The fact that -mcmodel=small means something else in Windows needs to be documented.

Alternatively, I will propose to define a new value for mcmodel that fits Windows. This model should limit the distance between code and static data to 2 GB so that relative addresses can be used, but allow locating a program at any address. This new memory model should not be called Windows, because it can be useful in both Linux, Mac OS, and Windows.

And what about Mac? I have not tried what -mcmodel=small does with a Mac target.

Whether you decide to let -mcmodel=small work differently for Windows targets - and document it - or define a new model, you need to tell the Cygwin people what to do. You cannot expect them to use the small model when the doc says that it allows 32 bit absolute addresses. Right now they are producing sub-optimal code with 64-bit addresses of all static data.

efriedma-quic commented 5 years ago

Also, looking briefly, there is no such thing as a "large" code model on Windows, because it's impossible to generate a binary that large anyway. https://docs.microsoft.com/en-us/cpp/build/x64-software-conventions?view=vs-2019 . We should probably reject -mcmodel=large on Windows targets, instead of using movabsq.

The default code model is not something that can be configured with a build setting. Changing that would require directly patching the x86 backend.

efriedma-quic commented 5 years ago

In other words, -mcmodel=small works differently when a Windows target is specified.

Sort of.

The key difference is whether you're generating position-independent code. On Linux, PIC is disabled by default; you have to request it with -fPIC or -fPIE. On x86_64 Windows, -fPIC is the default. IIRC, it's impossible to generate non-position-independent code on x86_64 Windows because the PE format doesn't support it.

The exact meaning of the various -mcmodel flags varies by target. See https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf for a description of the "Small code model" and the "Small position independent code model" on x86-64. We could probably improve the clang documentation here.

llvmbot commented 5 years ago

Thanks for the comment Eli.

I cannot find a documentation of -mcmodel=small for Clang. The Gcc documentation says: -mcmodel=small Generate code for the small code model: the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model.

This does not fit Win64, where program code is not guaranteed to be loaded below 2 GB (though it often is). This difference is seen when a static array is accessed. Linux code uses a 32-bit absolute address with a scaled index:

movl myarray(,%rax,4), %eax

While Windows needs to make a pointer:

leaq myarray(%rip), %rcx
movl (%rcx,%rax,4), %eax

I tried this.

clang --target=x86_64-pc-cygwin -mcmodel=small

This makes the correct code for Windows, where the array is addressed with a pointer. Clang with a Linux target and -mcmodel=small uses the 32-bit absolute address method.

In other words, -mcmodel=small works differently when a Windows target is specified. Is this difference documented? If not, who can blame the Cygwin people for setting -mcmodel=large or medium?

Cygwin clang does the right thing when I set -mcmodel=small

efriedma-quic commented 5 years ago

The correct target for Cygwin should be --target=x86_64-pc-cygwin, I think.

Address generation is controlled by the mcmodel flag; -mcmodel=small forces the small code model (rip-relative), -mcmodel=large forces the large code model (movabsq). The default code model should be small for all Windows targets, including cygwin, as far as I know.

Windows binaries of llvm and clang are available from llvm.org.

llvm.org doesn't distribute cygwin binaries. If the cygwin clang is behaving differently from the llvm.org clang binaries, you'll have to ask the maintainer of the cygwin package.

llvm / llvm-project

Inefficient 64-bit absolute addresses in Cygwin #42328

Extended Description