Open dcaballe opened 2 years ago
https://github.com/iree-org/iree/pull/10287#issuecomment-1241256362 shows the magnitude of the problem. When we lower a tosa.rescale
operation to Arith before the vectorizer and expose its mixed-length types to it (i8
, i32
and i64
), we get massive regressions on x86 (85%, 63%, etc.). I think that could be lowering tosa.rescale
before the vectorizer and not getting any regression could be a good metric for success here.
Tile size computation in LLVMCPU is crying out for a refresh. The current approach is getting difficult to maintain and debug even for those familiar with the code. The goal is to refactor all the incremental tile size computation for vectorization/unrolling that happens along multiple functions in KernelDispatch.cpp to a single place and to extend and use LinalgOpInfo analysis to make a more informed decision on the tile sizes needed.
Some requirements/steps/suggestions:
There are plenty of other things we can do but I think this would be a good starting point. Other suggestions are welcome!