[LinalgExt] Implement PadAttentionOp and Pass

iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

http://iree.dev/

Apache License 2.0

2.47k stars 548 forks source link

[LinalgExt] Implement PadAttentionOp and Pass #17679

Open raikonenfnu opened 1 week ago

raikonenfnu commented 1 week ago

Some more recent state of the art models have sizes that do not align well with our tile_sizes or power of two for that matter. this passes/transformation is introduced to help align these models.

Future work include adding K2 padding to align K and V sequence length. This would require adding support for masked_attention first.