intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs
MIT License
126 stars 36 forks source link

Use live range analysis to determine what regions have high register pressure #1902

Closed etiotto closed 6 days ago

etiotto commented 1 month ago

During code generation for FlashAttention we have found out that the loop in that kernel requires careful scheduling of operations. For example load operations for the operand of tt.dot operations need to be "sinked" closer to their uses in order to reduce their live range and consequently reduce register pressure for the kernel.

Another example involves generation of large 2D loads which "serve" more than one tt.dot operation. Is conceivable that for kernel that exhibit large register pressure the large load might need to be split into a sequence of smaller 2D loads in order to split the live range of the single load and reduce its live range.

Another potential scenario involves the selection of variables that could be placed into shared local memory, therefore reducing requirements on register allocation.

In this work item we will build an analysis to be used to identify live ranges of values (which a focus on loops). This analysis can then be used to identify regions that have overlapping live ranges. These regions are the ones that would potentially benefit from live range splitting transformations.

etiotto commented 3 weeks ago

IGC is working on improving instruction scheduling and should release that change soon. Consequently we can put this work item in the backlog and either cancel it or follow up later (if IGC changes so not pan out as expected).