iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.47k stars 548 forks source link

SDXL punet_quant.mlir fails to compile at ConvertConvToChannelsLastPass #17643

Closed aviator19941 closed 5 days ago

aviator19941 commented 2 weeks ago

What happened?

Running --mlir-print-ir-after-all, ConvertConvToChannelsLastPass failed: // -----// IR Dump After ConvertConvToChannelsLastPass Failed (iree-preprocessing-convert-conv-to-channels-last) //----- //

After turning off the layout propagation for packs/unpacks, I got this error (full error in the attached file):

punet_quant.mlir:18483:13: error: 'arith.cmpf' op requires attribute 'predicate'
    %6229 = torch.aten.clamp %6228, %int-128_296, %int127_297 : !torch.vtensor<[2,320,128,128],f16>,
    !torch.int, !torch.int -> !torch.vtensor<[2,320,128,128],f16>

Steps to reproduce your issue

  1. Download punet_quant.mlir
  2. Compile on gfx942: ../iree-build-trace/tools/iree-compile punet_quant.mlir --iree-global-opt-propagate-transposes=true --iree-opt-const-eval=false --iree-opt-outer-dim-concat=true --iree-vm-target-truncate-unsupported-floats --iree-llvmgpu-enable-prefetch=true --iree-opt-data-tiling=false --iree-codegen-gpu-native-math-precision=true --iree-rocm-waves-per-eu=2 --iree-flow-inline-constants-max-byte-length=1 --iree-preprocessing-pass-pipeline="builtin.module(iree-preprocessing-transpose-convolution-pipeline, util.func(iree-preprocessing-pad-to-intrinsics))" --iree-flow-enable-aggressive-fusion --iree-global-opt-enable-fuse-horizontal-contractions=true --iree-opt-aggressively-propagate-transposes=true --iree-codegen-llvmgpu-use-vector-distribution=true --iree-hal-target-backends=rocm --iree-rocm-target-chip=gfx942 --iree-vm-bytecode-module-output-format=flatbuffer-binary -o punet.vmfb
  3. See error

What component(s) does this issue relate to?

Compiler

Version information

b44581a390956a21c653200da4273d8d03d23571

Additional context

No response

qedawkins commented 2 weeks ago

cc @hanhanW

hanhanW commented 2 weeks ago

It looks like there is a bug in ConvertConvToChannelsLastPass. @IanWood1 please help triage when you're available, thank you!

aviator19941 commented 2 weeks ago

After commenting out the propagation layout from ConvertConvToChannelsLastPass, I found that the predicate op was missing from the distributedOp in GPUDistributionPatterns.cpp. I will add a PR for that fix, but still need help triaging the ConvertConvToChannelsLastPass.

hanhanW commented 2 weeks ago

It would be good if you can share the IR before the pass, then @IanWood1 can start from there.

hanhanW commented 2 weeks ago

The command would be something like iree-compile --mlir-print-ir-before=iree-preprocessing-convert-conv-to-channels-last --mlir-elide-elementsattrs-if-larger=0 --mlir-elide-resource-strings-if-larger=0 ...

IanWood1 commented 2 weeks ago

@aviator19941 did you get an error before disabling layout propagation for packs/unpack (other than output from --mlir-print-ir-after-failure)?

Without propagation disabled, it appears like DataLayoutPropagationPatterns rewrite patterns are failing to converge. I let the greedy rewriter run with no limit and got an error during verification after the pass https://gist.github.com/IanWood1/269798dffcde630a06ef70b2c5fcdebd/raw/f5001f37673e57ce4d811da4cca897d6822999c8/punet-compile.log

aviator19941 commented 2 weeks ago

@aviator19941 did you get an error before disabling layout propagation for packs/unpack (other than output from --mlir-print-ir-after-failure)?

Without propagation disabled, it appears like DataLayoutPropagationPatterns rewrite patterns are failing to converge. I let the greedy rewriter run with no limit and got an error during verification after the pass https://gist.github.com/IanWood1/269798dffcde630a06ef70b2c5fcdebd/raw/f5001f37673e57ce4d811da4cca897d6822999c8/punet-compile.log

let me check if I can reproduce this with the ToM iree and I'll also get you the IR before the pass.

aviator19941 commented 1 week ago

@aviator19941 did you get an error before disabling layout propagation for packs/unpack (other than output from --mlir-print-ir-after-failure)?

Without propagation disabled, it appears like DataLayoutPropagationPatterns rewrite patterns are failing to converge. I let the greedy rewriter run with no limit and got an error during verification after the pass https://gist.github.com/IanWood1/269798dffcde630a06ef70b2c5fcdebd/raw/f5001f37673e57ce4d811da4cca897d6822999c8/punet-compile.log

This is the IR I got before disabling layout propagation on the IREE version specified above (b44581a390956a21c653200da4273d8d03d23571): https://gist.github.com/aviator19941/f76d3e86754517578807a710ed9d1195.