llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.08k stars 11.59k forks source link

Feature request: inline `__aeabi_read_tp` for ARMv7a ELF TLS #37742

Open rprichard opened 6 years ago

rprichard commented 6 years ago
Bugzilla Link 38394
Version trunk
OS Linux

Extended Description

Clang generates a call to an __aeabi_read_tp function to access an arm32 ELF TLS variable using the Initial-Exec or Local-Exec access models. GCC also generates a call to this ABI function for ARMv5, but for better performance on ARMv7, it inlines the call.

Could Clang also inline the call?

test.c:

    __thread int tlsvar;

    int bump() {
      return ++tlsvar;
    }

Clang output (clang test.c -target arm-linuxeabi -march=armv7a -Os -S):

    bump:
      push  {r11, lr}
      mov r11, sp
      ldr r2, .LCPI0_0
      bl  __aeabi_read_tp
      ldr r1, [r0, r2]
      add r1, r1, #​1
      str r1, [r0, r2]
      mov r0, r1
      pop {r11, pc}

GCC output (arm-linux-gnueabi-gcc-7 test.c -march=armv5 -Os -S):

    bump:
      str lr, [sp, #-4]!
      bl  __aeabi_read_tp @ load_tp_soft
      ldr r2, .L3
      ldr r3, [r0, r2]
      add r3, r3, #​1
      str r3, [r0, r2]
      mov r0, r3
      ldr pc, [sp], #​4

GCC output (arm-linux-gnueabi-gcc-7 test.c -march=armv7-a -Os -S):

    bump:
      ldr r3, .L2
      mrc p15, 0, r2, c13, c0, 3  @ load_tp_hard
      ldr r0, [r2, r3]
      add r0, r0, #​1
      str r0, [r2, r3]
      bx  lr
rprichard commented 6 years ago

With -mtp=cp15, GCC inlines the access for ARM and Thumb modes, but Clang only inlines it for ARM, not Thumb. https://godbolt.org/g/QJrbYy

rprichard commented 6 years ago

GCC defaults to -mtp=auto, whereas Clang doesn't have -mtp=auto, so it defaults to -mtp=soft. https://reviews.llvm.org/D34878?id=114582#863140

rprichard commented 6 years ago

Like GCC, Clang already has an -mtp={soft,cp15} option to select between calling __aeabi_read_tp and inlining the mrc instruction (added in https://reviews.llvm.org/D34408).

ilovepi commented 7 months ago

@rprichard can this be closed? The godbolt example seems to imply that this is fixed at ToT (and likely has been for some time).

rprichard commented 7 months ago

It looks like LLVM inlines __aeabi_read_tp for both ARM+Thumb starting with LLVM 13.0.1.

There's still no -mtp=auto, though. That might matter because people would be unlikely to pass -mtp=cp15 but would specify armv7a Linux (for example), which AFAIK could inline the function.

https://reviews.llvm.org/D34878?id=114582#863140

The 'auto' value should automatically pick 'cp15' if that's going to work on what you're targeting. If I understood correctly, that depends both on the architecture version you're targeting and the operating system/kernel you're targeting. So, there could be a lot of details to go through to get 'auto' right in all cases. Which is why I think it's fine to leave an implementation of 'auto' for later.

I don't know if we need to keep this feature request open for it or not, though.

ilovepi commented 7 months ago

It looks like LLVM inlines __aeabi_read_tp for both ARM+Thumb starting with LLVM 13.0.1.

There's still no -mtp=auto, though. That might matter because people would be unlikely to pass -mtp=cp15 but would specify armv7a Linux (for example), which AFAIK could inline the function.

https://reviews.llvm.org/D34878?id=114582#863140

The 'auto' value should automatically pick 'cp15' if that's going to work on what you're targeting. If I understood correctly, that depends both on the architecture version you're targeting and the operating system/kernel you're targeting. So, there could be a lot of details to go through to get 'auto' right in all cases. Which is why I think it's fine to leave an implementation of 'auto' for later.

I don't know if we need to keep this feature request open for it or not, though.

I think that's fine. For context, I was looking for some clues about ARM32 performance regressions and came across this issue. When I saw the godbolt example seemed to have the same behavior across GCC and clang I just wondered if this was stale, but I think you're right about -mtp=auto case, so lets leave this open until that is handled.