iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.47k stars 548 forks source link

In `SetEncoding`, only set one of `matmul_narrow_{M,N}` #17641

Closed bjacob closed 1 week ago

bjacob commented 2 weeks ago

@lialan reports:

Question: I hit cases where matmul N dimension is too narrow:

[cpu-materialize-encoding]: tile (16, 16, 2) is skipped because it is not valid for upper_bound (1, 1, 16)
[cpu-materialize-encoding]: tile (8, 16, 2) is skipped because it is not valid for upper_bound (1, 1, 16)
[cpu-materialize-encoding]: tile (4, 16, 2) is skipped because it is not valid for upper_bound (1, 1, 16)
[cpu-materialize-encoding]: tile (2, 16, 2) is skipped because it is not valid for upper_bound (1, 1, 16)
[cpu-materialize-encoding]: tile (1, 16, 2) is skipped because it is not valid for upper_bound (1, 1, 16)

In such cases, do we assume the input is illegal? because we cannot find legal tiles

In SetEncoding for matmuls, when the M or N dimension is small, we may set the matmul_narrow_{M,N} attribute on the EncodingAttr, which has the effect of limiting padding in that dimension. Then, later, in MaterializeEncoding, we may transpose the matmul to reduce the matmul_narrow_ case to transpose_narrow_M, and we select an appropriate tile size for the small M value. Then that allows picking up a corresponding narrow-M code path in the mmt4d ukernel. That tile size is only narrow in the M dimension, it is generic large in the N dimension.

But what happens if both matmul_narrow_{M,N} are set? Then, we have limited padding in both dimensions. The situation is still the same after transposing, and so we now can't use a narrow-M tile, because that tile is for generic large-N and now we also have narrow-N.

By the time we reach MaterializeEncoding, it's too late to fix that up, since we have already limited padding (and buffer allocations) in both M and N dimensions. So this must be dealt with at SetEncoding time.

=> Solution: in SetEncoding, we should not set both matmul_narrow_{M,N} encoding attributes. If both M and N are small static values for which we would normally set the attribute, we should compare them and only set matmul_narrow_{M,N} for the smaller of the two. For example, if we would normally set matmul_narrow_M=4 and matmul_narrow_N=1 then in fact we should only set matmul_narrow_N=1 .

If there is a tie, that is if we would set both matmul_narrow_{M,N} to the same value, then we should only set matmul_narrow_M. Avoiding the transposition business in that case.