Open oscbyspro opened 4 months ago
Why do you believe the first version is slower? What timing tests have you done? On what processors?
If I'm reading the assembly correctly, the -O
version is using a 32-bit division operation when both arguments are less than 32 bits, while the -Osize
version always uses the 64-bit division. It makes sense to me that the first version could be faster on processors where 32-bit division is much faster than 64-bit, but I've not done any timing measurements myself.
Feel free to disregard this issue if the -O
version performs better.
It's LLVM that makes this codegen decision (clang
will do the same thing), so you could also raise the issue on the LLVM bug tracker if you find the code generation to be less-than-ideal for your use case. From my armchair optimizer's seat, it seems like it would at least be marginally better to do a jne .LBB1_2
and fall into the probably-more-likely 32-bit divisor case rather than jump away to it.
Description
No response
Reproduction
Given this function:
Swift 5.10, Godbolt,
-O
:Swift 5.10, Godbolt,
-Osize
:Expected behavior
I thought
-O
would be more serious about it.Environment
x86-64 swiftc 5.10 on https://swift.godbolt.org
Additional information
No response