iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.84k stars 612 forks source link

[CPU][DT] Revisit unset_encoding for data-tiling pipeline #16981

Open hanhanW opened 7 months ago

hanhanW commented 7 months ago

@MaheshRavishankar @bjacob and I had a discussion today about not having unset_encoding ops at Flow. This can make fusion logics simpler; it also make mmt4d fusion easier. The proposal is only setting encoding on LHS and RHS, but not RESULT. In the context, we still create unpack ops during MaterializeEncoding. We create unpack ops together with mmt4d from materializing encodings. So at flow level, we will see

%lhs = iree_linalg_ext.set_encoding %orig_lhs : tensor<?x?xf32> -> tensor<?x?xf32, #iree_linalg_ext.encoding<role =  LHS, element_types = [f32, f32, f32], user_indexing_maps = [#map, #map1, #map2]>>
%rhs = iree_linalg_ext.set_encoding %orig_rhs : tensor<?x?xf32> -> tensor<?x?xf32, #iree_linalg_ext.encoding<role =  RHS, element_types = [f32, f32, f32], user_indexing_maps = [#map, #map1, #map2]>>
%matmul = linalg.matmul
  ins(%lhs, %rhs : ... )
  outs(%init : tensor<?x?xf32>)
%elem = linalg.generic ... ins(%matmul ...) outs(...)

We can form %matmul and %elem into the same dispatch. The codegen input become:

func.func @dispatch_1(...) {
  %lhs = flow.dispatch.tensor.load ...
  %rhs = flow.dispatch.tensor.load ...
  %matmul = linalg.matmul
    ins(%lhs, %rhs : ... )
    outs(%init : tensor<?x?xf32>)
  %elem = linalg.generic ... ins(%matmul, %in0 ...) outs(...)
}

Then it get materialized into

%mmt4d = linalg.mmt4d ins(%lhs, %rhs) .. (%outs)
%unpack = tensor.unpack %mmt4d ... : tensor<?x?x16x16xf32> -> tensor<?x?xf32>
%elem = linalg.generic ... ins(%unpack, %in0 ...) outs(...)

[optional] At this moment, there is an opportunity to push down %unpack across %elem op. So it can get closer to flow.dispatch.tensor.store.

%mmt4d = linalg.mmt4d ins(%lhs, %rhs) .. (%outs)
%packed_in0 = tensor.pack %in0 ... : tensor<?x?xf32> -> tensor<?x?xf32>
%elem = linalg.generic ... ins(%mmt4d, %packed_in0 ...) outs(...)
%unpack = tensor.unpack %elem ... : tensor<?x?x16x16xf32> -> tensor<?x?xf32>

Then we apply the transformations like what we have today. The mmt4d op still can be lowered to ukernels after distribution. The rest of ops are codegen-ed.

If we don't push down the %unpack op, the same thing happens here. The mmt4d op can be lowered to ukernels, and the rest of ops are codegen-ed. It just depends on how codegen want to handle unpack + elem.

(cc @Max191 @pashu123 )

hanhanW commented 7 months ago

also cc @qedawkins who has been looking at data-tiling for GPU cases.

Max191 commented 7 months ago

So at flow level, we will see

%lhs = iree_linalg_ext.set_encoding %orig_lhs : tensor<?x?xf32> -> tensor<?x?xf32, #iree_linalg_ext.encoding<role =  LHS, element_types = [f32, f32, f32], user_indexing_maps = [#map, #map1, #map2]>>
%rhs = iree_linalg_ext.set_encoding %orig_rhs : tensor<?x?xf32> -> tensor<?x?xf32, #iree_linalg_ext.encoding<role =  RHS, element_types = [f32, f32, f32], user_indexing_maps = [#map, #map1, #map2]>>
%matmul = linalg.matmul
  ins(%lhs, %rhs : ... )
  outs(%init : tensor<?x?xf32>)
%elem = linalg.generic ... ins(%matmul ...) outs(...)

So will the result tensor of the linalg.matmul not have an encoding attribute? It seems like this is tying the unset encoding too closely to the matmul. Especially if we start thinking about propagation of encodings, this will restrict what we can do with unset encoding. I think unset encoding needs to be its own operation to avoid a semantic discontinuity between flow and codegen, since unset encoding will ultimately materialize into its own operation.

hanhanW commented 7 months ago

We wont have unset_encoding ops because the output operands don't have encodings.