Open zesterer opened 1 month ago
I've not been able to test Rust's new GCC backend since I've not been able to work out how to tell it to generate code for ARM32 targets.
The GCC backend is not a cross-compiler so you'll have to compile it for each target you would like to compile for.
rustc
generates suboptimal code on 32-bit ARM targets when performing a load from a base + offset pointer. This seems to be a general issue, rearing its head in a number of programs I've written, including trivial examples.Since this pattern - loading from a non-constant address that's been offset by an index - is very common in real code and in particular inner loops, I'd be surprised if this doesn't have a non-trivial impact on the performance of real code.
Note that LLVM doesn't seem to exhibit this poor behaviour on aarch64 (ARM 64) targets.
produces
I'd expect it to produce
as GCC does. I believe that on many targets (at the very least, armv4) the latter is always faster than the former.
Note that this is an issue with LLVM: Clang also exhibits this poor code generation.
Rust (rustc, bad): https://godbolt.org/z/7oPe8crM7 C (Clang, bad): https://godbolt.org/z/4M9E7Kh91 C (GCC, good): https://godbolt.org/z/639cxxKc8
I've not been able to test Rust's new GCC backend since I've not been able to work out how to tell it to generate code for ARM32 targets.
rustc --version --verbose
: