nchristensen / feintune

Autotune batched einsum loopy programs
MIT License
2 stars 0 forks source link

Code generation of large kernels is too slow for autotuning #4

Open nchristensen opened 1 year ago

nchristensen commented 1 year ago

With loops only tagged 'for' it is probably fast enough.

nchristensen commented 1 year ago

For tuning purposes, it may be sufficient to measure the flop rates of single batches rather than the entire kernel.