Support dynamic shape when lowering operation to vector

intel / graph-compiler

MLIR-based toolkit targeting intel heterogeneous hardware

Apache License 2.0

30 stars 14 forks source link

%19 = affine.apply affine_map<(d0) -> ((d0 floordiv 64) * 64)>(%arg3) %20 = affine.min affine_map<(d0) -> (((d0 + 31) floordiv 64) * 64 - (d0 floordiv 64) * 64 + 64, (d0 floordiv 64) * -64 + 224)>(%arg3) %extracted_slice = tensor.extract_slice %arg0[%19, 0] [%20, 13] [1, 1] : tensor<224x13xbf16> to tensor<?x13xbf16> %extracted_slice_2 = tensor.extract_slice %1[%15, 0, 0, 0] [%17, 1, 64, 14] [1, 1, 1, 1] : tensor<4x1x64x14xbf16> to tensor<?x1x64x14xbf16> %pack = tensor.pack %extracted_slice padding_value(%cst : bf16) outer_dims_perm = [0, 1] inner_dims_pos = [0, 1] inner_tiles = [64, 14] into %extracted_slice_2 : tensor<?x13xbf16> -> tensor<?x1x64x14xbf16> %21 = tensor.empty(%18) : tensor<?x14xbf16> %unpack = tensor.unpack %pack outer_dims_perm = [0, 1] inner_dims_pos = [0, 1] inner_tiles = [64, 14] into %21 : tensor<?x1x64x14xbf16> -> tensor<?x14xbf16>

Another follow up of this case.

Dynamic shape will also occur when loop step is indivisible. For llama mlp matmul shape:

numactl -C 0-55 -m 0 python3 ./tools/main.py --driver=mlp --batch_size=32 --hidden_size_list=11008x4096 --has_bias=4096 --act_type=relu --dtype=bf16 --warm_up=100 --repeat=500

The generated IR after deepTileMatmul contains dynamic shape ? due to indivisible loop step. The shapes involved in vector operations are still static. However, currently CPUPhysicalRegister pass skipped processing this case.

The detailed IR log is attached. llama_mlp_32x11008x4096.txt

intel / graph-compiler

Support dynamic shape when lowering operation to vector #300