Support in-place optimization for threadblock output saver

mirage-project / mirage

A multi-level tensor algebra superoptimizer

https://mirage-project.readthedocs.io/

Apache License 2.0

341 stars 18 forks source link

Support in-place optimization for threadblock output saver #25

Open jiazhihao opened 3 months ago

jiazhihao commented 3 months ago

Threadblock output saver currently allocates a separate stensor for the output tensor, which results in high shared memory overhead. We should enable in-place optimization for output saver and close this issue once the implementation is merged to the main branch.