Investigate the new tensor descriptor API

The information needed to create a block pointer are:

Base: pointer to parent tensor
Shapes: parent tensor shape
Strides: parent tensor strides
Block_shape
Offsets: block offsets
Order: block order

The new proposal for Tensor descriptor allows users to create a tensor descriptor on the device, which will be lowered into a TMA descriptor. This PR extends the Triton Dialect with a new operation MakeTensorDescOp. This operation contains the following information:

Base: pointer to parent tensor
Shapes: parent tensor shape
Strides: parent tensor strides
Block_shape Combining these pieces of information with the offsets included in the experimental_descriptor_{load/store} operations, could allow us to have all the required information to translate Tensor descriptor into block pointer. A possible solution could be to add a pass to the front of the Intel pipeline that:
1. Search for load/store Ops using a ptr created by a MakeTensorDescOp (at some point should be a specific type, see https://github.com/triton-lang/triton/blob/837308f780066a6606df4362a9648f9ce55c625b/include/triton/Dialect/Triton/IR/TritonOps.td#L961)
2. Create a MakeTensorPtrOp just before the load/store with information collected from the MakeTensorDescOp + offsets from the load/store operation Notice that, in that case, the MakeTensorPtrOp cannot be moved (or must be moved with care => cannot be moved before the offset computation)

Possible alternative: keep the MakeTensorDescOp as it is in the pipeline and implement only our XPU-specific lowering that should use 2D block operations.

intel / intel-xpu-backend-for-triton

Investigate the new tensor descriptor API #2586