Open davidstone opened 5 years ago
Thanks for the detailed description!
I agree that this boils down to %4 = load...
in the snippet below blocking elimination, because it might alias %0
. If they do not alias, store 0, ...
is dead. But if they alias, the last store effectively makes both other stores dead.
But I am not sure how we could best integrate that kind of reasoning in DSE (and it seems no other compiler in the godbolt does that kind of reasoning).
define dso_local void @_Z8std_swapPiS_(i32* %0, i32* %1) {
%3 = load i32, i32* %0, align 4
store i32 0, i32* %0, align 4
%4 = load i32, i32* %1, align 4
store i32 %4, i32* %0, align 4
store i32 %3, i32* %1, align 4
ret void
}
Still not optimized: https://llvm.godbolt.org/z/9scPhTo3z
Extended Description
The following code optimizes well for
custom_swap
andrestrict_std_swap
, but has an additionalmov
instruction forstd_swap
:Compiles into this IR with -O1, -O2, -Os, or -O3:
As we see from the example that annotates the parameters with __restrict, the problem appears to be that the risk of lhs aliasing rhs disables the optimizer's ability to remove the dead store in the second line of std_swap. It is able to see that if they don't alias, the store in line 2 is dead. It is not able to see that if they do alias, the store in line 3 is dead and the store in line 2 is dead.
See it live: https://godbolt.org/z/8nCTnL
The real life problem here is that types that manage a resource but do not implement a custom std::swap, as well as all types that recursively contain a type that manages a resource, suffer from reduced performance for using std::swap. The larger, slightly more meaningful test case showing how I arrived at this reduction and its relationship to std::swap:
See it live: https://godbolt.org/z/tWjmzo