Closed ketsuban closed 11 months ago
The reason it pushes lr
is because the 32-bit ARM ABI ("AAPCS") dictates the stack always be aligned to 8 bytes, and lr
is the highest register accessible to the THUMB push
instruction, and commonly needs to be pushed anyway for subroutine calls.
I believe the oddball epilogue is because all 4 scratchable registers (r0-r3) are used for the return value. So it has to restore lr
through r4
first, then restore r4
.
Normally, if you had a free register, it would do something like pop {rX}; pop {r4}; bx rX
, where the saved value of LR gets popped into some free register (since THUMB pop
can only pop pc
, not lr
, which can't be used as a return on ARMv4T due to lack of interworking on pc
writes).
So, while LLVM's codegen particularly for v4 THUMB is often suboptimal, I don't think LLVM is really doing anything wrong here, even if it looks very strange.
EDIT: I was going to exemplify this using a godbolt link for C, but it turns out neither GCC or Clang support __int128
on ARM targets, and the thumbv4t-none-eabi
can't be used on godbolt due to -Z build-std=core being needed. Regardless way I'm confident this is simply an architectural limitation and not something LLVM can do anything about.
Also, I think the use of the word "demented" is unnecessary, comes off as hostile, and is arguably ableist. I understand being confused or frustrated about suboptimal (or seemingly suboptimal) codegen but I don't think it necessitates such language.
The reason it pushes
lr
is because the 32-bit ARM ABI ("AAPCS") dictates the stack always be aligned to 8 bytes
I hate when ABIs force nonsensical codegen. It's a 32-bit platform, the stack shouldn't need greater than 4-byte alignment. Oh well. Guess I'll close this one as "reality is a disappointment".
Also, I think the use of the word "demented" is unnecessary, comes off as hostile, and is arguably ableist.
It didn't even occur to me that the term had any link with mental health; I'll try to choose my words more carefully. ("Hostile", though?)
I read a Wikipedia page that mentions a "contrived 32-bit shift" of a 128-bit integer and I thought of Rust's native 128-bit integer support, so I decided to see how efficiently it uses registers on the platform I'm most used to at this point: ARM.
For didactic purposes I'll start with
arm-unknown-linux-gnueabi
.The ARM ABI has four registers which functions are allowed to clobber so it cleanly makes use of all of them. What about when there isn't a fourth scratch register, though? I already have a setup which uses
thumbv4t-none-eabi
, I expect it'll push one callee-saved register to the stack and use it as a fourth scratch register instead?That's less good. It does push
r4
like I expected, but there's no reason for it to ever touchlr
, and then it has a small fit in the function epilogue.This is a synthetic problem because I have no reason to ever use a 128-bit integer like this (even for pseudorandom number generation there are algorithms that operate on four 32-bit values individually rather than needing 128-bit operations) but I care about Rust being the best it can be and LLVM is clearly having a time here.