mirage-project / mirage

A multi-level tensor algebra superoptimizer
https://mirage-project.readthedocs.io/
Apache License 2.0
341 stars 18 forks source link

Support in-place optimization for threadblock output saver #25

Open jiazhihao opened 3 months ago

jiazhihao commented 3 months ago

Threadblock output saver currently allocates a separate stensor for the output tensor, which results in high shared memory overhead. We should enable in-place optimization for output saver and close this issue once the implementation is merged to the main branch.