iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.85k stars 616 forks source link

[LLVMGPU] Add a verifier for tile sizes in `lowering_config` #19041

Open kuhar opened 1 week ago

kuhar commented 1 week ago

The lowering_config attribute can be attached to linalg ops to configure them outside of the heuristic in KernelConfig.cpp.

For example:

        %17 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3, d4) -> (d0, d2, d4)>, affine_map<(d0, d1, d2, d3, d4) -> (d1, d3, d4)>, affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3)>], iterator_types = ["parallel", "parallel", "parallel", "parallel", "reduction"]} ins(%10, %11 : tensor<2x1024x1280xi8>, tensor<20x64x1280xi8>) outs(%16 : tensor<2x20x1024x64xi32>) attrs =  {lowering_config = #iree_gpu.lowering_config<{mma_kind = #iree_gpu.mma_layout<MFMA_I32_16x16x32_I8>, promote_operands = [0, 1], reduction = [0, 0, 0, 0, 128], subgroup_m_count = 2 : i64, subgroup_n_count = 2 : i64, workgroup = [1, 1, 64, 160, 0]}>} {
        ^bb0(%in: i8, %in_0: i8, %out: i32):
          %19 = arith.extsi %in : i8 to i32
          %20 = arith.extsi %in_0 : i8 to i32
          %21 = arith.muli %19, %20 : i32
          %22 = arith.addi %out, %21 : i32
          linalg.yield %22 : i32
        } -> tensor<2x20x1024x64xi32>

We should verify that the reduction and workgroup tile sizes are arrays of the same length as the number of loops in the linalg op (here, same as the number of iterator types). If the sizes don't match, we should error out instead of implicitly padding with zeros (like we currently do).

This is less obvious with named ops that don't have indexing maps attached to them, for example something like:

%23 = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} ins(%13, %14 : tensor<2x130x130x320xi8>, tensor<3x3x320x320xi8>) outs(%22 : tensor<2x128x128x320xi32>) -> tensor<2x128x128x320xi32>

has 7 loops, which is not immediately obvious. The error message should explain how many tile sizes were expected.

As for the reproducer for this, you can write a sample dispatch by hand and compile it with --iree-hal-dump-executable-files-to=dir, modify the configured dispatch by changing the tile sizes, and resuming compilation with --compile-from=executable-sources.

kuhar commented 10 hours ago

Reassigning to @bangtianliu as discussed with @MaheshRavishankar