Improve tile size computation for vectorization/unrolling in quantized models using LinalgOpInfo

Tile size computation in LLVMCPU is crying out for a refresh. The current approach is getting difficult to maintain and debug even for those familiar with the code. The goal is to refactor all the incremental tile size computation for vectorization/unrolling that happens along multiple functions in KernelDispatch.cpp to a single place and to extend and use LinalgOpInfo analysis to make a more informed decision on the tile sizes needed.

Some requirements/steps/suggestions:

Remove the min/max tile size range for unrolling/vectorization. Vectorization/unrolling factors should be computed in one shot by providing all the information needed at that time.
Tile sizes for vectorization/unrolling must be constant after its computation. It's shouldn't be allowed by design to tweak tile sizes here and there in favor of a clean implementation and unsurprising outcome.
LinalgOpInfo should be extended to gather information about the number of operations in a LinalgOp and their type size.
Tile sizes for quantized models should be heuristically computed using the new type size information provided by LinalgOp. For example, if a dispatch combines i8 and i32 operations, the number of operations of each type should help us decide if we should pick a vectorization factor based on the i8 and i32 type to keep register pressure under control.

There are plenty of other things we can do but I think this would be a good starting point. Other suggestions are welcome!

iree-org / iree

Improve tile size computation for vectorization/unrolling in quantized models using LinalgOpInfo #10363