Open matouskozak opened 2 years ago
@llvm/issue-subscribers-clang-codegen
Largely historical.
The SSE encoding for andps is 1 byte shorter than pand. I’m not sure if there is a difference for AVX. Depends on which can use the 2 byte VEX prefix.
It also used to be that there were more execution units available for andps than pand on Intel CPUs. I don’t think that’s true on modern CPUs.
These were the two main reasons to convert int to FP.
@llvm/issue-subscribers-backend-x86
Addressing this may provide a minor performance advantage if there is a long dependency chain. ICL/TGL has a penalty of one clock when using the result of a vector int operation in a vector float operation. Zen3 has that penalty when transitioning either way (FP to int and vice versa). Agner Fog's SW Optimization resources
We should only be doing the conversion if we can convert the surrounding instructions too. That’s why we also used an FP store.
When executing bitwise operations on
Vector128<int>
in Runtime Mono on x86-64 CPU, LLVM is emitting single-precision floating-point instruction. Example for logical bitwise AND:This is the LLVM IR which is generated:
And the assembly from this:
Could someone please explain me why there is a
vandps
and notvpand
? Thank you