Call for Action: Supporting Dim Order in Backend Delegates

Description:

As part of our recent updates to ExecuTorch's core IR, we've introduced dim order as a representation of memory format for each tensor. However, this feature is still not enabled by default, and we need delegate authors' support to make it fully functional. In this context, dim order refers to the ordering of dimensions in a tensor's memory layout. For example, a 4D tensor with dimensions (batch, channel, height, width) can be represented in different memory formats by permuting the order of these dimensions.

In our implementation, dim order is represented as an array of integers, where each integer corresponds to a dimension in the tensor. For instance, a contiguous memory format would be represented as [0, 1, 2, 3], indicating that the dimensions are stored in the order of batch, channel, height, and width. On the other hand, a channels_last memory format would be represented as [0, 2, 3, 1], indicating that the channel dimension is stored last.

Compared with torch.memory_format, dim_order provides an explicit and detailed representation of each dimension's meaning, and is directly accessible as part of ExecuTorch's IR for any tensor.

Issues:

The introduction of dim order as a representation of memory format presents two challenges for backend support: Removing torch.memory_format: The introduction of dim order means that backends will need to handle tensor arguments with non-default dim-order either AoT or at runtime. This will cause issues if the dim order in the current graph is not acceptable for the backend. New operators: We've introduced new operators, such as to_dim_order_copy, which use dim order as the target memory format instead of the MemoryFormat variable. These operators do not exist in other places, so backends may not be equipped to handle them.

Solution Options:

To address these challenges, we provide an example solution using the XNNPACK delegate as a reference. We encourage delegate authors to follow a similar pipeline to support dim order on their own backends: Delegate guards: Implement delegate guards to ensure that AoT and runtime checks are performed correctly. See PR 4725 for an example implementation. Revert dim_order ops: Add a new pass to revert dim_order ops back to regular ops (e.g., to_dim_order_copy back to to_copy) inside the partitioned subgraph. Please note that this is a temporary solution to give delegate authors more time to adapt to this IR change. Although it's ultimately up to the delegate authors to decide how to handle the partitioned graph, we recommend avoiding this pass in the long run and instead using the dim_order arg and ops directly. See PR 4520 and its stack for an example implementation. Partition dim_order ops (if necessary): If your delegate currently lowers to_copy() op, you may need to add an additional step to partition dim_order ops before reverting them. This is because the to_dim_order_copy op may not be compatible with the existing partitioning logic. XNNPACK does not require this step since it does not partition to_copy() op.

Call to Action

We encourage delegate authors to follow the pipeline demonstrated by XNNPACK to support dimension order in their respective backends. This approach will ensure compatibility with the new IR, and extend the functionality of ExecuTorch with dim order representation and operations. We hope to enable dim_order ops in the Edge IR by ET Beta (end of Sep) to ensure Edge IR stability post Beta. Therefore, we kindly request all backend delegates to be ready to handle dim_order ops in the IR by then by applying the above-mentioned relatively easy fixes/updates.

@larryliu0820 @digantdesai @mcr229 @cccclai

pytorch / executorch