Open mfrancepillois opened 3 weeks ago
The information needed to create a block pointer are:
The new proposal for Tensor descriptor allows users to create a tensor descriptor on the device, which will be lowered into a TMA descriptor. This PR extends the Triton Dialect with a new operation MakeTensorDescOp
.
This operation contains the following information:
experimental_descriptor_{load/store}
operations, could allow us to have all the required information to translate Tensor descriptor into block pointer.
A possible solution could be to add a pass to the front of the Intel pipeline that:
MakeTensorDescOp
(at some point should be a specific type, see https://github.com/triton-lang/triton/blob/837308f780066a6606df4362a9648f9ce55c625b/include/triton/Dialect/Triton/IR/TritonOps.td#L961)MakeTensorPtrOp
just before the load/store with information collected from the MakeTensorDescOp
+ offsets from the load/store operation
Notice that, in that case, the MakeTensorPtrOp
cannot be moved (or must be moved with care => cannot be moved before the offset computation)Possible alternative: keep the MakeTensorDescOp
as it is in the pipeline and implement only our XPU-specific lowering that should use 2D block operations.
OpenAI has improved the way structured memory access is handled.
The PR : https://github.com/triton-lang/triton/pull/4916 cleans-up, extends the triton dialect with new operations and improves the way TMA descriptors are handled by triton.
As significant changes result from this PR, we should investigate if and how these memory accesses using tensor descriptors could be transformed into block pointers.