Handle `scf.for` block arguments in `-tritonintelgpu-optimize-elementwise-parallelism`

intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs

MIT License

143 stars 44 forks source link

Handle `scf.for` block arguments in `-tritonintelgpu-optimize-elementwise-parallelism` #2675

Closed victor-eds closed 3 days ago

victor-eds commented 1 week ago

-tritonintelgpu-optimize-elementwise-parallelism has a small limitation: scf.for block arguments are not optimized.

If we have a "broadcasted" tensor acting as a block argument, this will have a very high impact in register pressure. Optimize scf.for block arguments in a similar way to elementwise operations operands.

victor-eds commented 1 week ago

To be done in this iteration. Haven't started.

victor-eds commented 4 days ago

Code ready to push. Evaluating whether this is needed after all. Will push PR or close this issue as won't fix this week.

victor-eds commented 3 days ago

Not needed as previous passes (like the optimize reduction) can be modified so optimal layouts are propagated.