Open colin-arnet opened 7 months ago
You can disable loop unrolling and vectorization to get similar output with gcc: https://godbolt.org/z/33f5d7eMn
TBH clang's output looks better since loop unrolling improves the performance of branch prediction.
"very complex code" doesn't mean that it is slow. If you care about the code size, please compile it with Os/Oz.
Clang -O3 is not able to optimize the loop and loop body and generates very complex code. gcc is able to optimize the program to a much smaller and simpler assembly.
https://godbolt.org/z/qecj6o43o
x86 -O3 Assembly: