This is a mirror and a fork of the upstream LLVM repository with the DPU hardware support. The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
When looking at #57 in O0, i found at that this is a more generic issue.
Here are two examples with the problem (64-bit rotation & 64-bit count leading zeros): https://dpu.dev/z/zwJoIP
The important, and faulty, part in both cases is that the result from the arithmetic + jump instruction is not correctly spilled:
after lsl r0, r1, r0, sh32, .LBB0_2, r0 is directly erased
after clz.u d4, r0, nmax, .LBB0_2, d4 is not used and finally erased
Here is my hypothesis: in O0, llvm will always spill registers just before the last instruction in a MBB. However, this is only possible if the last instruction in the MBB does not modify registers, and it seems like llvm does not handle the other case correctly (it will try to spill the register before the last instruction anyway).
It seems to me that the only safe solution for us is to never use arithmetic + jump instructions in O0.
Most uses can be found in the functions called by DPUTargetLowering::EmitInstrWithCustomInserter (in llvm/lib/Target/DPU/DPUTargetLowering.cpp). The 16-bit multiplication case seems to be already handled (cf PerformMULCombine in llvm/lib/Target/DPU/DPUTargetLowering.cpp).
The other important case is when we automatically try to merge two instructions (cf DPUMacroFusion.cpp and mostly DPUMergeComboInstrPass.cpp). I could not find an example to make it fail (llvm adds a lot of load/store between the operations before this pass, so that it may not be possible to merge anything), however I think it would be safer to also disable this pass in O0.
When looking at #57 in
O0
, i found at that this is a more generic issue. Here are two examples with the problem (64-bit rotation & 64-bit count leading zeros): https://dpu.dev/z/zwJoIPThe important, and faulty, part in both cases is that the result from the arithmetic + jump instruction is not correctly spilled:
lsl r0, r1, r0, sh32, .LBB0_2
,r0
is directly erasedclz.u d4, r0, nmax, .LBB0_2
,d4
is not used and finally erasedHere is my hypothesis: in
O0
, llvm will always spill registers just before the last instruction in aMBB
. However, this is only possible if the last instruction in theMBB
does not modify registers, and it seems like llvm does not handle the other case correctly (it will try to spill the register before the last instruction anyway).It seems to me that the only safe solution for us is to never use arithmetic + jump instructions in O0. Most uses can be found in the functions called by
DPUTargetLowering::EmitInstrWithCustomInserter
(inllvm/lib/Target/DPU/DPUTargetLowering.cpp
). The 16-bit multiplication case seems to be already handled (cfPerformMULCombine
inllvm/lib/Target/DPU/DPUTargetLowering.cpp
). The other important case is when we automatically try to merge two instructions (cfDPUMacroFusion.cpp
and mostlyDPUMergeComboInstrPass.cpp
). I could not find an example to make it fail (llvm adds a lot of load/store between the operations before this pass, so that it may not be possible to merge anything), however I think it would be safer to also disable this pass inO0
.