Open rprichard opened 6 years ago
With -mtp=cp15, GCC inlines the access for ARM and Thumb modes, but Clang only inlines it for ARM, not Thumb. https://godbolt.org/g/QJrbYy
GCC defaults to -mtp=auto, whereas Clang doesn't have -mtp=auto, so it defaults to -mtp=soft. https://reviews.llvm.org/D34878?id=114582#863140
Like GCC, Clang already has an -mtp={soft,cp15} option to select between calling __aeabi_read_tp and inlining the mrc instruction (added in https://reviews.llvm.org/D34408).
@rprichard can this be closed? The godbolt example seems to imply that this is fixed at ToT (and likely has been for some time).
It looks like LLVM inlines __aeabi_read_tp
for both ARM+Thumb starting with LLVM 13.0.1.
There's still no -mtp=auto
, though. That might matter because people would be unlikely to pass -mtp=cp15
but would specify armv7a Linux (for example), which AFAIK could inline the function.
https://reviews.llvm.org/D34878?id=114582#863140
The 'auto' value should automatically pick 'cp15' if that's going to work on what you're targeting. If I understood correctly, that depends both on the architecture version you're targeting and the operating system/kernel you're targeting. So, there could be a lot of details to go through to get 'auto' right in all cases. Which is why I think it's fine to leave an implementation of 'auto' for later.
I don't know if we need to keep this feature request open for it or not, though.
It looks like LLVM inlines
__aeabi_read_tp
for both ARM+Thumb starting with LLVM 13.0.1.There's still no
-mtp=auto
, though. That might matter because people would be unlikely to pass-mtp=cp15
but would specify armv7a Linux (for example), which AFAIK could inline the function.https://reviews.llvm.org/D34878?id=114582#863140
The 'auto' value should automatically pick 'cp15' if that's going to work on what you're targeting. If I understood correctly, that depends both on the architecture version you're targeting and the operating system/kernel you're targeting. So, there could be a lot of details to go through to get 'auto' right in all cases. Which is why I think it's fine to leave an implementation of 'auto' for later.
I don't know if we need to keep this feature request open for it or not, though.
I think that's fine. For context, I was looking for some clues about ARM32 performance regressions and came across this issue. When I saw the godbolt example seemed to have the same behavior across GCC and clang I just wondered if this was stale, but I think you're right about -mtp=auto
case, so lets leave this open until that is handled.
Extended Description
Clang generates a call to an
__aeabi_read_tp
function to access an arm32 ELF TLS variable using the Initial-Exec or Local-Exec access models. GCC also generates a call to this ABI function for ARMv5, but for better performance on ARMv7, it inlines the call.Could Clang also inline the call?
test.c:
Clang output (
clang test.c -target arm-linuxeabi -march=armv7a -Os -S
):GCC output (
arm-linux-gnueabi-gcc-7 test.c -march=armv5 -Os -S
):GCC output (
arm-linux-gnueabi-gcc-7 test.c -march=armv7-a -Os -S
):