Open ActuallyaDeviloper opened 6 years ago
file a bug report anyway?
After checking the assembly on my machine, i noted the same problem.
Fixing this issue is in progress. A testing suite to inspect generated instruction counts is in progress of being developed. When finished, it should help to identify all performance issues in the generated code across all the compilers that the library supports.
any progress on this or advice on how to work around this problem?
Today I was trying out whether libsimdpp would be a good fit for our project which currently makes heavy use of performance critical SSE SIMD instructions. Unfortunately while doing so, I ran into exceptionally bad code generation for the following simple test function:
It generates this machine code in a 2015 x64 release build with default settings:
The value is apparently repeatedly written and read from the stack for no apparent reason. Note that perfect code would just consist of a series of
addps
andmulps
instruction. Perfect code is generated if ordinary SSE intrinsics are used. Note that the end result is similar with MSVC 2017.I believe that the problem is two folded:
First, the VC++ compiler spills the simdpp vectors on the stack because
__vectorcall
's ABI fails. I have had this issue myself in the past. I believe that this is caused by the intensive use of inheritance in the simdpp library. I made a minimal code example illustrating the problem on Godbolt,l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:cl19_64,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',trim:'0'),lang:c%2B%2B,libs:!(),options:'-O2+-std%3Dc%2B%2B17',source:1),l:'5',n:'0',o:'x86-64+MSVC+19+2017+RTW+(Editor+%231,+Compiler+%231)+C%2B%2B',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4).Second, the VC++ compiler further fails to do any appropriate register allocation and keeps writing to the stack as a result.
I have considered also filing a bug report to the Microsoft Compiler team, but due to my past experience with their team and because the fix would break their ABI (which I believe is stable now since 2015), I have decided against it. A fix seems unlikely.
It would be great if simdpp could make less use of inheritance or find another way to mitigate the problem i.e. make
__vectorcall
work in a future release.