llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.15k stars 12.03k forks source link

[GlobalISel][AArch64] Calling convention for HFAs incorrect on big-endian targets #34707

Closed ostannard closed 2 years ago

ostannard commented 7 years ago
Bugzilla Link 35359
Resolution FIXED
Resolved on Aug 18, 2020 13:59
Version trunk
OS Linux
Depends On llvm/llvm-project#26535
CC @qcolombet

Extended Description

Global-isel generates incorrect code when targeting big-endian AArch64 for this code: struct foo { float first; float second; }; float get_first(foo p) { return p.first; }

This is the code that global-isel currently generates: // /work/llvm/build/bin/clang --target=aarch64--none-eabi -march=armv8-a -c callees.cpp -O0 -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mbig-endian -o - -S _Z9get_first3foo: // @​_Z9get_first3foo // BB#0: // %entry sub sp, sp, #​16 // =16 // implicit-def: %X8 fmov w9, s0 mov w10, w9 bfxil x8, x10, #​0, #​32 fmov w9, s1 mov w10, w9 bfi x8, x10, #​32, #​32 add x10, sp, #​8 // =8 str x8, [sp, #​8] ldr w9, [x10] fmov s0, w9 add sp, sp, #​16 // =16 ret

When run on a big-endian target, this incorrectly returns the second member of the struct, instead of the first.

aemerson commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#35360

aemerson commented 4 years ago

The root cause of this has been fixed a long time ago, but at the moment we fall back to SDAG anyway for big endian for other reasons. Closing.

aemerson commented 6 years ago

This will require a more complex fix due to the way we deal with aggregates currently. llvm/llvm-project#26535 is tracking some work in finalizing the design choice for handling aggregates (splitting into multiple LLTs or using one large scalar type).

The problem here seems to be that at -O0 the struct is stored to the stack and then the first element is loaded to return as a float. The struct store should be done as a sequence of element-wise stores but we end up merging the two floats in the aggregate into an s64 and then store it (since s64 stores are legal). On little endian this is fine as the first element is then in the right place for the 32 bit load. However on big-endian the s64 store causes the bytes to be reversed and the first logical struct element ends up in the wrong place.

If we used the SDAG approach in the IRTranslator to split the aggregate into element types then this problem should go away. Since that piece of work is going to be more complex I suggest that fall back to SDAG in global-isel if we're targeting BE.

qcolombet commented 6 years ago

Hi Amara, could you take a look? Thanks.

ostannard commented 7 years ago

assigned to @aemerson