Do not insert barriers in new sub-group shuffle layout conversions

intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs

MIT License

142 stars 43 forks source link

Do not insert barriers in new sub-group shuffle layout conversions #2557

Open victor-eds opened 1 week ago

victor-eds commented 1 week ago

Our backend introduces a new way to perform layout conversions via shuffles. Modify the --intel-allocate-shared-memory pass to not introduce barriers before shuffles not using SLM at all and thus not needing barriers.

Note this may be fixed by #2556.

victor-eds commented 1 day ago

Depends on #2611. This will enable us to optimize SLM allocations and not inserting additional barriers.