iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.85k stars 620 forks source link

Llama 3.1 8B fp16 TP8 sharded fails to compile for CPU and GPU #19263

Open aviator19941 opened 12 hours ago

aviator19941 commented 12 hours ago

What happened?

When I try to compile the sharded Llama 3.1 8b fp16 IR for CPU or GPU:

I get this error for CPU: https://gist.github.com/aviator19941/82bceb2624571d446da0964440790fde

and this error for GPU: https://gist.github.com/aviator19941/89761b3bbb6ace5a6945de667e6d1e39

I tried to use these flags that were suggested to be used when compiling Llama as well: --iree-dispatch-creation-enable-aggressive-fusion=true --iree-global-opt-propagate-transposes=true --iree-opt-aggressively-propagate-transposes=true --iree-opt-data-tiling=false --iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' --iree-hal-indirect-command-buffers=true --iree-stream-resource-memory-model=discrete --iree-hip-legacy-sync=false --iree-hal-memoization=true --iree-opt-strip-assertions

Steps to reproduce your issue

  1. wget the IR: https://gist.github.com/aviator19941/bab5886f53f2fd0b3b8458519148542c
  2. Try to compile for CPU: ../iree-build-no-trace/tools/iree-compile 8b_f16_tp8_decomposed.mlir -o=8b_f16_tp8_decomposed_cpu.vmfb --iree-hal-target-device=llvm-cpu[0] --iree-hal-target-device=llvm-cpu[1] --iree-hal-target-device=llvm-cpu[2] --iree-hal-target-device=llvm-cpu[3] --iree-hal-target-device=llvm-cpu[4] --iree-hal-target-device=llvm-cpu[5] --iree-hal-target-device=llvm-cpu[6] --iree-hal-target-device=llvm-cpu[7]
  3. CPU error: https://gist.github.com/aviator19941/82bceb2624571d446da0964440790fde
  4. Try to compile for GPU: ../iree-build-no-trace/tools/iree-compile 8b_f16_tp8_decomposed.mlir --iree-hip-target=gfx942 -o=8b_f16_tp8_decomposed.vmfb --iree-hal-target-device=hip[0] --iree-hal-target-device=hip[1] --iree-hal-target-device=hip[2] --iree-hal-target-device=hip[3] --iree-hal-target-device=hip[4] --iree-hal-target-device=hip[5] --iree-hal-target-device=hip[6] --iree-hal-target-device=hip[7]
  5. GPU error: https://gist.github.com/aviator19941/89761b3bbb6ace5a6945de667e6d1e39

What component(s) does this issue relate to?

No response

Version information

iree-base-compiler 3.1.0rc20241121

Additional context

No response

sogartar commented 46 minutes ago

About the CPU compilation error. I made a fix when exporting for the unsharded case where we want no device affinities. This is a sharded variant. At a first glance argument and global parameter affinities look fine. It is probably something with the flow.tensor.transfer ops.