Open MaheshRavishankar opened 6 months ago
https://gist.github.com/stellaraccident/83b357bbe2da31d5872cdbfcdd93aeea#file-example-mlir is a gist that is having extremely high compilation times. Culprit seems to be the default lowering config logic is using a very high vector size for such ops
%9 = linalg.generic {indexing_maps = [#map3, #map4, #map5], iterator_types = ["parallel", "parallel", "parallel", "reduction", "reduction", "reduction"]} ins(%expanded, %6 : tensor<?x?x20x8x32xf32>, tensor<5120x20x8x32xf32>) outs(%8 : tensor<?x?x5120xf32>) {
^bb0(%in: f32, %in_1: f32, %out: f32):
%10 = arith.mulf %in, %in_1 : f32 loc(#loc49)
%11 = arith.addf %10, %out : f32 loc(#loc50)
linalg.yield %11 : f32 loc(#loc51)
} -> tensor<?x?x5120xf32> loc(#loc45)
@pashu123 I am nominating you to keep track of these and making sure these are burnt down :) . @bjacob and @hanhanW can help you
I'm just getting back to this and thought I saw a patch go by that was meant to address - but still seems to be happening. (for the extremely high compilation time / crash)
I'm just getting back to this and thought I saw a patch go by that was meant to address - but still seems to be happening. (for the extremely high compilation time / crash)
I think https://github.com/iree-org/iree/pull/17115 fixes the issue. According to the PR description, it improves the compilation time from 40 seconds to 0.6 seconds. I was quite busy on other things, so I reviewed the change late. We should be able to land it tomorrow.
https://github.com/iree-org/iree/pull/17115 is merged now.
As turbine LLM (and different quantizations schemes implemented and lowered as part of it) ramps up, the CPU backend needs to be able to compile and execute sample kernels in reasonable time frame. Several issues have been hit as this comes up. This bug is to track these different issues as they come up.
Please adapt this list to log/itemize issues/comments/gists that need to be fixed.