Open pzread opened 6 months ago
I thought multi lowering_config should work for convolution.. The issue is probably from scalable vectorization... Can you try disabling it only when scalable vector is involved?
I thought multi lowering_config should work for convolution.. The issue is probably from scalable vectorization... Can you try disabling it only when scalable vector is involved?
Not sure if I follow this. My understanding is that due to the TODO I mentioned in the issue, we don't set multi lower_config when the root op is a convolution op. In the case where convolution is followed by a pack op, the tile-and-fuse will start from the pack op but directly use the tilling config from the convolution op, which results in incorrect tile sizes (the tile sizes should be scaled with inner tile size)
I meant, does it work on x86 CPUs if we comment out below lines?
When compiling the example below with:
It will generate a lot of ops during
LLVMCPUVirtualVectorLowering
, which results in bad performance.I suspect it is due the the wrong tile sizes used on the convolution op. In the dump below we can see that
[1, 1, 8, 16, 0, 0, 0]
is set for the parallel dim tile sizes for the convolution op, but after tile-and-fuse, the final tile size of the convolution op istensor<1x1x96x16xf32>
.I think it is due to the TODO below, which doesn't propagate and scale the tile sizes for the convolution op to the pack op. When we do the tile-and-fuse, the last compute op:
tensor.pack
in the dispatch is used as the start point and if it doesn't have lowering config, the lowering config from the root op: convolution is directly borrowed. However the tile sizes of pack op on outer dims need to be scaled with its inner tile sizes. Directly borrowing from convolution op will result in too large tile sizes.https://github.com/openxla/iree/blob/cdff01fcf74f8799a10dddcd5d279f6bbba9ebcc/compiler/src/iree/compiler/Codegen/LLVMCPU/KernelDispatch.cpp#L2302-L2309
This regresses the EfficientNetV2 latency when working on https://github.com/openxla/iree/issues/16682 to create more pack/unpack fusions