Arithmetic + Jump instructions are not correctly handled in O0

When looking at #57 in O0, i found at that this is a more generic issue. Here are two examples with the problem (64-bit rotation & 64-bit count leading zeros): https://dpu.dev/z/zwJoIP

The important, and faulty, part in both cases is that the result from the arithmetic + jump instruction is not correctly spilled:

after lsl r0, r1, r0, sh32, .LBB0_2, r0 is directly erased
after clz.u d4, r0, nmax, .LBB0_2, d4 is not used and finally erased

Here is my hypothesis: in O0, llvm will always spill registers just before the last instruction in a MBB. However, this is only possible if the last instruction in the MBB does not modify registers, and it seems like llvm does not handle the other case correctly (it will try to spill the register before the last instruction anyway).

It seems to me that the only safe solution for us is to never use arithmetic + jump instructions in O0. Most uses can be found in the functions called by DPUTargetLowering::EmitInstrWithCustomInserter (in llvm/lib/Target/DPU/DPUTargetLowering.cpp). The 16-bit multiplication case seems to be already handled (cf PerformMULCombine in llvm/lib/Target/DPU/DPUTargetLowering.cpp). The other important case is when we automatically try to merge two instructions (cf DPUMacroFusion.cpp and mostly DPUMergeComboInstrPass.cpp). I could not find an example to make it fail (llvm adds a lot of load/store between the operations before this pass, so that it may not be possible to merge anything), however I think it would be safer to also disable this pass in O0.

upmem / llvm-project

Arithmetic + Jump instructions are not correctly handled in O0 #58