When I look at the LLVM bitcode file being generated, the corresponding hvx runtime method is being invoked @halide.hexagon.mul.vuw.vuw (<64 x i32> %a, <64 x i32> %b). Note that these are <64 x i32> instead of <32 x i32>. hvx_128.ll
The statements of the files are:
%71 = tail call <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32> undef, <32 x i32> %66)
%72 = tail call <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32> undef, <32 x i32> %70)
%a_lo.i.1 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %71) #11
%a_hi.i.1 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %71) #11
%b_lo.i.1 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %72) #11
%b_hi.i.1 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %72) #11
%a_e.i.1 = tail call <32 x i32> @llvm.hexagon.V6.vshufeh.128B(<32 x i32> %a_hi.i.1, <32 x i32> %a_lo.i.1) #11
%a_o.i.1 = tail call <32 x i32> @llvm.hexagon.V6.vshufoh.128B(<32 x i32> %a_hi.i.1, <32 x i32> %a_lo.i.1) #11
%b_e.i.1 = tail call <32 x i32> @llvm.hexagon.V6.vshufeh.128B(<32 x i32> %b_hi.i.1, <32 x i32> %b_lo.i.1) #11
%b_o.i.1 = tail call <32 x i32> @llvm.hexagon.V6.vshufoh.128B(<32 x i32> %b_hi.i.1, <32 x i32> %b_lo.i.1) #11
%ab_e.i.1 = tail call <64 x i32> @llvm.hexagon.V6.vmpyuhv.128B(<32 x i32> %a_e.i.1, <32 x i32> %b_e.i.1) #11
%ab_o1.i.1 = tail call <64 x i32> @llvm.hexagon.V6.vmpyuhv.128B(<32 x i32> %a_o.i.1, <32 x i32> %b_e.i.1) #11
%ab_o.i.1 = tail call <64 x i32> @llvm.hexagon.V6.vmpyuhv.acc.128B(<64 x i32> %ab_o1.i.1, <32 x i32> %a_e.i.1, <32 x i32> %b_o.i.1) #11
%a_lo.i.i.1 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %ab_e.i.1) #11
%l_lo.i.i.1 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %ab_o.i.1) #11
%s_lo.i.i.1 = tail call <32 x i32> @llvm.hexagon.V6.vaslw.acc.128B(<32 x i32> %a_lo.i.i.1, <32 x i32> %l_lo.i.i.1, i32 16) #11
%a_hi.i.i.1 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %ab_e.i.1) #11
%l_hi.i.i.1 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %ab_o.i.1) #11
%s_hi.i.i.1 = tail call <32 x i32> @llvm.hexagon.V6.vaslw.acc.128B(<32 x i32> %a_hi.i.i.1, <32 x i32> %l_hi.i.i.1, i32 16) #11
%s.i.i.1 = tail call <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32> %s_hi.i.i.1, <32 x i32> %s_lo.i.i.1) #11
%73 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %s.i.i.1)
%66 and %70 are the vector registers being loaded. According to these wouldn't there be undefined behavior as the vectors are being combined with undef and then there are multiplications occurring using the undef state? Is there any description of how this runtime method is correct?
Hello! I'm trying to understand how the hexagon backend (CodeGen_Hexagon.cpp) is lowering uint32 multiplies. In this runtime file .
I have the following c++ file contents:
The corresponding stmt contents are:
When I look at the LLVM bitcode file being generated, the corresponding hvx runtime method is being invoked
@halide.hexagon.mul.vuw.vuw (<64 x i32> %a, <64 x i32> %b)
. Note that these are <64 x i32> instead of <32 x i32>. hvx_128.llThe statements of the files are:
%66
and%70
are the vector registers being loaded. According to these wouldn't there be undefined behavior as the vectors are being combined withundef
and then there are multiplications occurring using the undef state? Is there any description of how this runtime method is correct?