Question: why do we want to coalesce here in the first place, afaict it's never possible to have as little index arithmetic in the coalesced case as the uncoalesced case. You might be to convert the coalesced case to something without modular arithmetic, but you'll still need some division shenanigans, which you don't need if you don't coalesce -- without colaescing, %arg1%arg2 and %arg3 are exactly what we want, and the 3 loops get translated to simple add-compare logic in scf-to-sf. The only reason I can think we would want to coalesce is that after convert-scf-to-cf you then have a cf.cond_br logic which is "simple": just a single variable iterating to 256, rather than a waterfall of 3 counters (to 8, 8, and 4 respectively). But why is this desirable, how does it help llvm/peano?
For iree-amd-aie's matmul pipeline, this is state we're in with coalescing enabled, just before
affine-to-standard
passAnd if there is no coalescing, the IR looks like:
Question: why do we want to coalesce here in the first place, afaict it's never possible to have as little index arithmetic in the coalesced case as the uncoalesced case. You might be to convert the coalesced case to something without modular arithmetic, but you'll still need some division shenanigans, which you don't need if you don't coalesce -- without colaescing,
%arg1
%arg2
and%arg3
are exactly what we want, and the 3 loops get translated to simple add-compare logic inscf-to-sf
. The only reason I can think we would want to coalesce is that afterconvert-scf-to-cf
you then have acf.cond_br
logic which is "simple": just a single variable iterating to 256, rather than a waterfall of 3 counters (to 8, 8, and 4 respectively). But why is this desirable, how does it help llvm/peano?(@MaheshRavishankar @Abhishek-Varma)