iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.85k stars 614 forks source link

Compilation issue for First Llama int4 on CPU backend #15060

Open Abhishek-Varma opened 1 year ago

Abhishek-Varma commented 1 year ago

What happened?

So, for CPU backend, for llama IR following scenario takes place :-

  1. It either doesn't get through compilation i.e. gets stuck in the compilation procedure with no apparent progress.
  2. The runtime gets stuck without dumping any result.

Stage A. I first tried dumping dispatches using -iree-hal-dump-executable-sources-to=hal_sources but individual dispatches couldn't be compiled. I therefore resorted to using --iree-flow-break-dispatch=forward_dispatch_NUMBER_ (where NUMBER is the dispatch I want to break into) to bisect for a dispatch during compilation. I found that breaking on dispatch_11 produces a similar result of compilation being stuck.

Stage B. I used the stripped version of the IR. I put that through compilation and found with --verify=false switched on that we hit the following error :-

Assertion `false && "unexpected operand; expected either a IREE::Stream::ResourceType or " "the result of a mlir::UnrealizedConversionCastOp"' failed.


I dumped the exact operand on which the above assert was taking place :-

This is the op it hits assert on:
%1029:2 = "flow.dispatch"(%93, %96, %1028, %43) <{entry_point = @forward_dispatch_702::@forward_dispatch_702_generic_1700x4096x32x128_f16, operandSegmentSizes = array<i32: 0, 4, 0, 0>, tied_operands = [3 : index, -1 : index]}> : (tensor<4096x32x128xi4>, tensor<4096x32xf16>, tensor<1700x32x128xf16>, tensor<1700x4096xf16>) -> (tensor<1700x4096xf16>, tensor<4096x32x128xf16>)



And without --verify=false we hit the following issue:

/home/abhishek/SHARK/elided_first_llama.mlir:920:11: error: operand #1 does not dominate this use
    %39 = linalg.generic {indexing_maps = [#map14, #map15, #map16], iterator_types = ["parallel", "parallel", "parallel", "reduction", "reduction"]} ins(%expanded_752, %38 : tensor<1x1700x32x128xf16>, tensor<4096x32x128xf16>) outs(%37 : tensor<1x1700x4096xf16>) {


/home/abhishek/SHARK/elided_first_llama.mlir:920:11: note: see current operation:
%79 = "linalg.generic"(%76, %3399#1, %78) <{indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d1, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1)>], iterator_types = [#linalg.iterator_type<parallel>, #linalg.iterator_type<parallel>, #linalg.iterator_type<reduction>, #linalg.iterator_type<reduction>], operandSegmentSizes = array<i32: 2, 1>}> ({
^bb0(%arg3: f16, %arg4: f16, %arg5: f16):
  %3525 = "arith.mulf"(%arg3, %arg4) <{fastmath = #arith.fastmath<none>}> : (f16, f16) -> f16
  %3526 = "arith.addf"(%3525, %arg5) <{fastmath = #arith.fastmath<none>}> : (f16, f16) -> f16
  "linalg.yield"(%3526) : (f16) -> ()
}) : (tensor<1700x32x128xf16>, tensor<4096x32x128xf16>, tensor<1700x4096xf16>) -> tensor<1700x4096xf16>
    %39 = linalg.generic {indexing_maps = [#map14, #map15, #map16], iterator_types = ["parallel", "parallel", "parallel", "reduction", "reduction"]} ins(%expanded_752, %38 : tensor<1x1700x32x128xf16>, tensor<4096x32x128xf16>) outs(%37 : tensor<1x1700x4096xf16>) {
          ^
/home/abhishek/SHARK/elided_first_llama.mlir:12342:13: note: operand defined here (op in the same block)
    %2015 = linalg.generic {indexing_maps = [#map14, #map15, #map16], iterator_types = ["parallel", "parallel", "parallel", "reduction", "reduction"]} ins(%expanded_1676, %2014 : tensor<1x1700x32x128xf16>, tensor<4096x32x128xf16>) outs(%37 : tensor<1x1700x4096xf16>) {

Steps to reproduce your issue

For the compile time issue on elided IR :-

  1. Download elided_first_llama.mlir.
  2. Run the following command :-
    iree-compile --iree-input-type=tm_tensor --iree-hal-target-backends=llvm-cpu \
    --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-target-cpu-features=host --iree-opt-data-tiling \
    --iree-llvmcpu-enable-microkernels --iree-llvmcpu-stack-allocation-limit=256000 \
    --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true \
    --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=true --iree-vm-target-truncate-unsupported-floats \
    --iree-codegen-check-ir-before-llvm-conversion=false --iree-vm-bytecode-module-output-format=flatbuffer-binary \
    elided_first_llama.mlir 2> debug_elided_compilation_issue

    (You may add verify=false as well to observe the concerned issue mentioned above)

And for the runtime issue on the complete IR or dispatch 11 issue mentioned above :-

  1. Download first_llama_int4_1700.mlir.
  2. Run the following command (with and without --iree-flow-break-dispatch) :-
    iree-compile --iree-input-type=tm_tensor --iree-hal-target-backends=llvm-cpu \
    --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-target-cpu-features=host --iree-opt-data-tiling \
    --iree-llvmcpu-enable-microkernels --iree-llvmcpu-stack-allocation-limit=256000 \
    --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true \
    --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=true  --verify=false \
    --iree-vm-target-truncate-unsupported-floats --iree-codegen-check-ir-before-llvm-conversion=false \
    --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-flow-break-dispatch=forward_dispatch_11_ \
    first_llama_int4_1700.mlir -o first_llama.vmfb

Runtime command for both case :-

iree-run-module --device=local-task --function=forward --module=first_llama.vmfb --input=1x1700x4096xf32 --input=1x1700xi64 --input=1x1700xi32

What component(s) does this issue relate to?

No response

Version information

Pip package :-

iree-compiler             20230926.533
iree-runtime              20230926.533

Source build's last commit :-

commit ef280a4d65e68b554ee81e8bbafa176188032c5c (HEAD -> main, origin/main, origin/HEAD)
Author: Stella Laurenzo <stellaraccident@gmail.com>
Date:   Tue Sep 26 17:40:20 2023 -0700

    Integrate llvm 20230926 (#15043)

    Co-authored-by: Groverkss <groverkss@gmail.com>
    Co-authored-by: Jakub Kuderski <jakub@nod-labs.com>

Additional context

No response

allieculp commented 1 year ago

Assigning to @dcaballe for now to take a look.

Groverkss commented 1 year ago

I think I know where the elided IR issue is coming from. What commit was this compiled on?

Abhishek-Varma commented 1 year ago

I forgot to add those details here. My bad! It's same as the issue for Vulkan for the same IR.

Updated the ticket and adding it here too :-

Pip package :-

iree-compiler             20230926.533
iree-runtime              20230926.533

Source build's last commit :-

commit ef280a4d65e68b554ee81e8bbafa176188032c5c (HEAD -> main, origin/main, origin/HEAD)
Author: Stella Laurenzo <stellaraccident@gmail.com>
Date:   Tue Sep 26 17:40:20 2023 -0700

    Integrate llvm 20230926 (#15043)

    Co-authored-by: Groverkss <groverkss@gmail.com>
    Co-authored-by: Jakub Kuderski <jakub@nod-labs.com>
Groverkss commented 1 year ago

That issue is fixed by https://github.com/openxla/iree/commit/ffd5ad4e401f2ca45611da31b47c39bbbd534ff7. Could you run this on an updated version? Not sure about the other issue, this is just the elided one.

allieculp commented 1 year ago

@Abhishek-Varma can you try running again?

Abhishek-Varma commented 1 year ago

Hi @Groverkss @allieculp - I'll take a look at this and update the ticket accordingly. Thanks!

allieculp commented 1 year ago

Any update here? @Abhishek-Varma

Abhishek-Varma commented 1 year ago

Hi @allieculp - I'll need time here because other higher priority tasks have come up.

If you want, we can close this ticket and reopen this if any issue arises.

allieculp commented 1 year ago

I think we prefer to leave it open, @dcaballe ? @Abhishek-Varma sounds good, let us know when you get to it!

dcaballe commented 1 year ago

Hey @allieculp, could anybody verify that this issue is actually impacting our version of llama?