Umbrella tracking bug for compilation/execution time issue on CPU with Turbine LLM

iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

http://iree.dev/

Apache License 2.0

2.82k stars 608 forks source link

Umbrella tracking bug for compilation/execution time issue on CPU with Turbine LLM #17078

Open MaheshRavishankar opened 6 months ago

MaheshRavishankar commented 6 months ago

As turbine LLM (and different quantizations schemes implemented and lowered as part of it) ramps up, the CPU backend needs to be able to compile and execute sample kernels in reasonable time frame. Several issues have been hit as this comes up. This bug is to track these different issues as they come up.

Please adapt this list to log/itemize issues/comments/gists that need to be fixed.

[ ] https://github.com/openxla/iree/issues/17022
[ ] Extremely high compilation time due to potentially large vector size selection for higher dimension (multi-reduction dimensions) operations (https://github.com/openxla/iree/issues/17078#issuecomment-2062331207)

MaheshRavishankar commented 6 months ago

https://gist.github.com/stellaraccident/83b357bbe2da31d5872cdbfcdd93aeea#file-example-mlir is a gist that is having extremely high compilation times. Culprit seems to be the default lowering config logic is using a very high vector size for such ops

 %9 = linalg.generic {indexing_maps = [#map3, #map4, #map5], iterator_types = ["parallel", "parallel", "parallel", "reduction", "reduction", "reduction"]} ins(%expanded, %6 : tensor<?x?x20x8x32xf32>, tensor<5120x20x8x32xf32>) outs(%8 : tensor<?x?x5120xf32>) {
    ^bb0(%in: f32, %in_1: f32, %out: f32):
      %10 = arith.mulf %in, %in_1 : f32 loc(#loc49)
      %11 = arith.addf %10, %out : f32 loc(#loc50)
      linalg.yield %11 : f32 loc(#loc51)
    } -> tensor<?x?x5120xf32> loc(#loc45)

MaheshRavishankar commented 6 months ago

@pashu123 I am nominating you to keep track of these and making sure these are burnt down :) . @bjacob and @hanhanW can help you

stellaraccident commented 6 months ago

I'm just getting back to this and thought I saw a patch go by that was meant to address - but still seems to be happening. (for the extremely high compilation time / crash)

hanhanW commented 6 months ago

I'm just getting back to this and thought I saw a patch go by that was meant to address - but still seems to be happening. (for the extremely high compilation time / crash)

I think https://github.com/iree-org/iree/pull/17115 fixes the issue. According to the PR description, it improves the compilation time from 40 seconds to 0.6 seconds. I was quite busy on other things, so I reviewed the change late. We should be able to land it tomorrow.

bjacob commented 6 months ago

https://github.com/iree-org/iree/pull/17115 is merged now.