iree-org / iree-turbine

IREE's PyTorch Frontend, based on Torch Dynamo.
Apache License 2.0
55 stars 25 forks source link

[TKW] Fix indexing of permute to enable attention #244

Closed raikonenfnu closed 2 weeks ago

raikonenfnu commented 2 weeks ago

Most of the time in GPU programming, we would only materialize "transposes"/"permutes" of data during reads and writes. When doing transposes/permutations of data in GPU registers, it is most of the time free/no-op, since threads will still own the same data but just symbolically different.