iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.86k stars 625 forks source link

Fold padding_value away if a tensor.pack op does not have incomplete tile #15417

Closed hanhanW closed 1 year ago

hanhanW commented 1 year ago

Found the case in IR dump and bug reports. We want the folder, which folds padding_value away if possible. It can be applied when we have enough static information. E.g, 500000 is divisible by 16 and 1200 is divisible by 1. In this context, we don't need padding_value.

module {
  func.func @main_dispatch_1297() {
    %c0 = arith.constant 0 : index
    %c12480 = arith.constant 12480 : index
    %cst = arith.constant 0.000000e+00 : f32
    %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1200x500000xf32>>
    %1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) alignment(64) offset(%c12480) : !flow.dispatch.tensor<writeonly:tensor<31250x1200x16x1xf32>>
    %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1200, 500000], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1200x500000xf32>> -> tensor<1200x500000xf32>
    %3 = tensor.empty() : tensor<31250x1200x16x1xf32>
    %pack = tensor.pack %2 padding_value(%cst : f32) outer_dims_perm = [1, 0] inner_dims_pos = [1, 0] inner_tiles = [16, 1] into %3 : tensor<1200x500000xf32> -> tensor<31250x1200x16x1xf32>
    flow.dispatch.tensor.store %pack, %1, offsets = [0, 0, 0, 0], sizes = [31250, 1200, 16, 1], strides = [1, 1, 1, 1] : tensor<31250x1200x16x1xf32> -> !flow.dispatch.tensor<writeonly:tensor<31250x1200x16x1xf32>>
    return
  }
}
hanhanW commented 1 year ago

Here is a doc about MLIR folders: https://mlir.llvm.org/docs/Canonicalization/#canonicalizing-with-the-fold-method

I think what can be done in the folder is that we can use getPaddingValueMutable() method and clear the range, which allows us in-place update. So it can be canonicalized with a fold method. If it is not feasible, we can add canonicalization patterns.

benvanik commented 1 year ago

would help readability for sure! we also have stream.alignment on dynamic dimensions we could use to support folding it away via dataflow analysis (but probably separate logic unless the integer range analysis upstream lets us seed it with known alignments)

hanhanW commented 1 year ago

@Shukla-Gaurav can you take a look at this when you are available? Thank you!