We are currently not doing a great job picking up the vector sizes for generic ops with mixed-length data types. The dispatch below shows a fully parallel element-wise op with i32 and i8 operations. We decide to vectorize with [1, 8, 16] for 512-bit vectors. This means that the i8 operations will be using only 128 bits/512-bits.
In our, hopefully new, tile size selection infra, we should improve the data-type analysis to get a better understanding of the type of the operations and make sure we fully utilize the 512-bit of the vector. This will lead to using 4 registers for the i32 ones so we should take that into account when deciding the unroll factor (and perhaps go with something smaller than 8 for the second dimension).
We are currently not doing a great job picking up the vector sizes for generic ops with mixed-length data types. The dispatch below shows a fully parallel element-wise op with
i32
andi8
operations. We decide to vectorize with [1, 8, 16] for 512-bit vectors. This means that thei8
operations will be using only 128 bits/512-bits.In our, hopefully new, tile size selection infra, we should improve the data-type analysis to get a better understanding of the type of the operations and make sure we fully utilize the 512-bit of the vector. This will lead to using 4 registers for the
i32
ones so we should take that into account when deciding the unroll factor (and perhaps go with something smaller than8
for the second dimension).Input:
After vectorization: