Some more recent state of the art models have sizes that do not align well with our tile_sizes or power of two for that matter. this passes/transformation is introduced to help align these models.
Future work include adding K2 padding to align K and V sequence length. This would require adding support for masked_attention first.
Some more recent state of the art models have sizes that do not align well with our tile_sizes or power of two for that matter. this passes/transformation is introduced to help align these models.
Future work include adding K2 padding to align K and V sequence length. This would require adding support for masked_attention first.