Closed nchristensen closed 1 year ago
In particular, duplicate_inames is much slower.
Can perhaps do another decompose -> transform -> recompose for creating the loop nests. Essentially rename the inames and then add each set of inames as a separate domain to the recomposed kernel.
Forming the batches during kernel recomposition seems to be a workaround for this.
This persists even after stripping out the iname tags from the kernel.