Open zoecarver opened 5 years ago
I only see three fmul instructions in the optimized code. Are you looking at unoptimized code?
Yes, I am looking at the optimized code. Take a look at the assembly or the IR gen in the link above.
Ah, I thought you meant LLVM IR. The nine instructions are from loop unrolling; if you use -Osize instead of -O you get the three you originally expected.
Ah, you're right! That's probably not the slow part. I tried removing the conditional fails which made it a little faster but, not much. Any other ideas?
Additional Detail from JIRA
| | | |------------------|-----------------| |Votes | 0 | |Component/s | Compiler | |Labels | Improvement, CodeGen, Optimizer, Performance | |Assignee | None | |Priority | Medium | md5: 833dd33d195d879e7d8fc3c9618b65e2Issue Description:
The following generates 9 different `fmul` instructions. It should only generate 3. It also generates 31 `getelementptr` instructions when it certainly doesn't need that many, and theoretically could only generate 4.
An equivalent program in C++ takes about 1/4th of the time to run. Here is a comparison of the codegen from swift and clang.
If others agree this is an issue, I will start working on a patch to try to resolve it.