nchristensen / feintune

Autotune batched einsum loopy programs
MIT License
2 stars 0 forks source link

Batching kernels with only "for" tags is much faster than with more complicated tags #6

Closed nchristensen closed 1 year ago

nchristensen commented 1 year ago

This persists even after stripping out the iname tags from the kernel.

nchristensen commented 1 year ago

In particular, duplicate_inames is much slower.

nchristensen commented 1 year ago

Can perhaps do another decompose -> transform -> recompose for creating the loop nests. Essentially rename the inames and then add each set of inames as a separate domain to the recomposed kernel.

nchristensen commented 1 year ago

Forming the batches during kernel recomposition seems to be a workaround for this.