Fold fill into max - Githubissues

plaidml / tpp-mlir

TPP experimentation on MLIR for linear algebra

https://arxiv.org/abs/2404.15204

Other

111 stars 31 forks source link

Fold fill into max #954

Closed adam-smnk closed 3 months ago

adam-smnk commented 3 months ago

Adds pattern that folds linalg.fill into linalg.max and outputs combined linalg.generic.

A constant filled buffer is replaced by a single constant used directly in max operation on the elements of the other operand. This allows to eliminate potential temporary buffer allocation and value initialization.

adam-smnk commented 3 months ago

Why is this restricted to max and not just any element-wise op?

I think this could be easily relax to all named ops. Processing generics is also doable but hassle. All in all, I just didn't want to bother for now.

adam-smnk commented 3 months ago

Fill folding together with broadcast folding finally improves named ops benchmarks. The remaining slowdown is caused by one more temporary buffer allocation. It is most likely caused by the folded max generic which is not in in-place format (outs to tensor.empty). The next step is to rewrite the max generic in the same way as we do with linalg-convert-add-in-place for generic adds.

adam-smnk commented 3 months ago

Looks like the two folders could be merged, it just requires more testing to see if I missed any edge cases. I'll follow up on that a bit later.