Closed python3kgae closed 2 years ago
AFAICT LLVM 10.x and @HEAD
produces the same code for your example: https://godbolt.org/z/resb7zbnf
Neither seems to do the inttoptr
-> bitcast
conversion you're suggesting were done previously.
Perhaps there are some differences in your compilation pipeline that may have affected which passes were run?
If you can reproduce the issue on compiler explorer that would help a lot.
Forget to mention -O3 :( Here's the compiler explorer link https://godbolt.org/z/jKxafYr5E You can see opt get a different result. And this is the link for the opt result to LLC https://godbolt.org/z/PhTGWMf1s.
Thanks Xiang
Thank you. Whatever affects this example has started with LLVM-12.
The IR transform does not appear to be target-specific (e.g. opt -mtriple=aarch64--
) is affected the same way.
I believe that it's LLVM-10 that was wrong and your example does not exactly do apples-to-apples comparison.
define void @foo(i64 nocapture readonly byval(i64) %0, i64 nocapture readonly byval(i64) %itop) local_unnamed_addr #0 { entry: %1 = bitcast i64* %0 to float**
Here we do know that we're storing to a poionter that was given to the kernel as a parameter. By convention all kernel input pointers are assumed to be global, so LLVM infers the correct AS and lowers the store to st.global.
%2 = load float*, float* %1, align 8 store float 1.000000e+00, float %2, align 4
%3 = load i64, i64* %itop, align 8
Here you're passing a pointer to an integer and, once that integer is loaded that's what probably breaks the address space inference.
%4 = inttoptr i64 %3 to float store float 2.000000e+00, float %4, align 4
ret void }
I do not know what exactly prevents transforming load i64, inttoptr
-> load i64*, bitcast i64*, float*
, but it does the right thing here, IMO.
While in this case it would happen to be OK, I do not think that would be correct assumption in general.
If you do want compiler to generate AS-specific loads/stores, you do need to give it correct information to make it happen. You can do it explicitly via addrspacecast
or implicitly by passing it as a pointer to a kernel.
In short, I think LLVM is working as intended now.
I see. Thanks to make it clear.
For llvm ir like this
Instruction combine in llvm10 will transform
into
And the final output ptx will use
st.global.u32
for the store.After update to llvm15, instruction combine will not do the transform anymore. As a result, final output ptx will use
st.u32
for the store.Is this change expected?
Thanks Xiang