iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.61k stars 585 forks source link

Error: 'util.global.load' op undefined global: @hoisted #14802

Open Abhishek-Varma opened 1 year ago

Abhishek-Varma commented 1 year ago

What happened?

This looks like a case of unregistered dialect.

Context :-

In the combined llama IR it fails in compilation stage (for CPU backend) with the following command where I'm using .mlirbc file of the combined IR :-

time ./build/tools/iree-compile --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-flow-enable-data-tiling --iree-llvmcpu-enable-microkernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-stream-resource-index-bits=64 --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-target-index-bits=64 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-vm-target-truncate-unsupported-floats --iree-codegen-check-ir-before-llvm-conversion=false --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-dump-executable-sources-to=hal_sources llama2_13b_int4.mlirbc -o llama2_13b_int4_cpuvmfb

and I have the dispatches dumped locally using the --iree-hal-dump-executable-sources-to=hal_sources flag in the above command.

Now usually to find the culprit dispatch which fails during iree-compile I bisect on the dumped dispatches and try to compile it with the same command I was using for the entire model's IR. So, while trying the following command which I used for the combined IR :-

time ./build/tools/iree-compile --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-flow-enable-data-tiling --iree-llvmcpu-enable-microkernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-stream-resource-index-bits=64 --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-target-index-bits=64 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-vm-target-truncate-unsupported-floats --iree-codegen-check-ir-before-llvm-conversion=false --iree-vm-bytecode-module-output-format=flatbuffer-binary hal_sources/module_forward_dispatch_0.mlir -o llama2_13b_int4_cpu.vmfb

I get :-

error: 'util.global.load' op undefined global: @hoisted

Steps to reproduce your issue

  1. Input dispatch is here.
  2. Command:
    time ./build/tools/iree-compile --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-flow-enable-data-tiling --iree-llvmcpu-enable-microkernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-stream-resource-index-bits=64 --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-target-index-bits=64 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-vm-target-truncate-unsupported-floats --iree-codegen-check-ir-before-llvm-conversion=false --iree-vm-bytecode-module-output-format=flatbuffer-binary dispatch.mlir -o llama2_13b_int4_cpu.vmfb

What component(s) does this issue relate to?

No response

Version information

No response

Additional context

@benvanik

Abhishek-Varma commented 1 year ago

Hi @benvanik @powderluv - this is the issue I was talking about where I'm unable to get the dispatch through the same command to generate a vmfb pertaining to just that dispatch.

There's almost 65 dispatches being dumped so modifying each manually would be time consuming - this issue looks like an unregistered dialect case + an issue with global constants which both forward function uses.

Meanwhile I'll try to use the individual model's IR (instead of the combined IR) to see if I can get a small repro of the compilation crash.