nod-ai / SHARK-ModelDev

Unified compiler/runtime for interfacing with PyTorch Dynamo.
Apache License 2.0
95 stars 48 forks source link

[SDXL] [VAE + attn] Numerics Issue (CONV + MATMUL on Vector-Distribution) #519

Open PhaneeshB opened 8 months ago

PhaneeshB commented 8 months ago

Compare f16 pytorch to f16 rocm pct correct : (8382536/8388608) (99.92761611938477%) Max error: 2.5 largest Error : 2.5

artefacts here input to vae - example_input_*_fp16 Inputs to conv - (vae_11_param6_fp16.npy, vae_11_param7_fp16.npy, vae_11_silu_fp16.npy) outputs - output_*_fp16.*

mlir file

qedawkins commented 8 months ago

Some notes from a quick investigation comparing different IREE backends on an rdna3 system.

Eyeballing the first few results from just the aten.convolution on its own shows rocm and vulkan giving identical results

# rocm
1x512x128x128xf16=[[[13.8438 52 24.5312 65.5625 5.31641 306.25 252.375 378 241
# vulkan
1x512x128x128xf16=[[[13.8438 52 24.5312 65.5625 5.31641 306.25 252.375 378 241

however cpu differs by a notable margin in some places

1x512x128x128xf16=[[[9.875 52.0625 24.5781 65.625 5.39062 302.25 252.375 378 241.25

This can be reproduced with the following commands using this gist based on the above IR: https://gist.github.com/qedawkins/8e12c83e8ed5804288c0691a180d546b

iree-compile base_ir/conv_only.mlir \
    --iree-vulkan-target-triple=rdna3-unknown-linux \
    --iree-llvmcpu-target-triple=x86_64-unknown-linux \
    --iree-rocm-target-chip=gfx1100 \
    --iree-hal-target-backends=llvm-cpu \
    --iree-global-opt-propagate-transposes=true \
    --iree-opt-outer-dim-concat=true \
    --iree-rocm-link-bc=true \
    --iree-opt-const-eval=true \
    --iree-rocm-bc-dir=/opt/rocm/amdgcn/bitcode \
    --iree-preprocessing-pass-pipeline="builtin.module(iree-preprocessing-transpose-convolution-pipeline)" \
    -o /tmp/sdxl.vmfb
    #--iree-hal-target-backends=vulkan-spirv \
    #--iree-hal-target-backends=rocm \

iree-run-module \
    --module=/tmp/sdxl.vmfb \
    --device=local-task \
    --function=main \
    --input=1x512x128x128xf16=@inputs/vae_11_silu_fp16.npy

note that this is on rdna3 and without vector distribution so is not necessarily a direct comparison. Additionally the SPIR-V and ROCm backends use a very similar approach to convolution so there could be a shared bug if there is one.

qedawkins commented 8 months ago

ok so it turned out I was using --input=1x512x128x128xf16=@inputs/vae_11_silu_fp16.npy wrong. This should just be --input=@inputs/vae_11_silu_fp16.npy without the shape.

qedawkins commented 8 months ago

Update: I also tried manually upcasting to f32 and comparing results between cpu and gpu (because cpu just upcasts to f32 for f16, so they aren't very comparable): https://gist.github.com/qedawkins/f6e04b93c10baaffa5deec7ad4dd48e6

And rocm + vulkan match bitwise, but llvm cpu differs from both by 1 for some random value

[FAILED] result[0]: element at index 5094 (1) does not match the expected (2); expected that the view is equal to contents of a view of 1x512x128x128xf16
qedawkins commented 8 months ago

After some more investigation, this seems like it is most likely at least partially a model problem. If we change the torch IR to accumulate in f32, Vulkan and ROCm are still bitwise equivalent, and are close to LLVMCPU. When using pure f16, different strategies (img2col vs direct convolution vs CPU) give significantly different numbers, suggesting that the numerical instability of accumulating in f16 is causing problems.

The reason I think this was improved but not fixed by --iree-codegen-llvmgpu-use-vector-distribution is because only some of the convolutions actually used this pipeline. Those that didn't (i.e. not aligned to the intrinsic shape) didn't go down this pipeline and thus still accumulated in f16.

@PhaneeshB is it possible to change the model to accumulate in f32? I tried just changing the output type on the conv IR you shared above and it compiled + ran fine for me.

module @compiled_vae {
  func.func @main(%arg0: tensor<1x512x128x128xf16>) -> tensor<1x512x128x128xf16> {
    %0 = torch_c.from_builtin_tensor %arg0 : tensor<1x512x128x128xf16> -> !torch.vtensor<[1,512,128,128],f16>
    %1 = call @decode_inp(%0) : (!torch.vtensor<[1,512,128,128],f16>) -> !torch.vtensor<[1,512,128,128],f16>
    %2 = torch_c.to_builtin_tensor %1 : !torch.vtensor<[1,512,128,128],f16> -> tensor<1x512x128x128xf16>
    return %2 : tensor<1x512x128x128xf16>
  }
  func.func private @decode_inp(%arg0: !torch.vtensor<[1,512,128,128],f16>) -> !torch.vtensor<[1,512,128,128],f16> {
    %int0 = torch.constant.int 0
    %false = torch.constant.bool false
    %0 = torch.vtensor.literal(dense_resource<torch_tensor_512_512_3_3_torch.float16> : tensor<512x512x3x3xf16>) : !torch.vtensor<[512,512,3,3],f16>
    %1 = torch.vtensor.literal(dense_resource<torch_tensor_512_torch.float16_3> : tensor<512xf16>) : !torch.vtensor<[512],f16>
    %int1 = torch.constant.int 1
    %2 = torch.prim.ListConstruct %int1, %int1 : (!torch.int, !torch.int) -> !torch.list<int>
    %3 = torch.prim.ListConstruct %int0, %int0 : (!torch.int, !torch.int) -> !torch.list<int>

    %none = torch.constant.none
    %int6 = torch.constant.int 6
    %cast1 = torch.aten.to.dtype %1, %int6, %false, %false, %none : !torch.vtensor<[512],f16>, !torch.int, !torch.bool, !torch.bool, !torch.none -> !torch.vtensor<[512],f32>

    %4 = torch.aten.convolution %arg0, %0, %cast1, %2, %2, %2, %false, %3, %int1 : !torch.vtensor<[1,512,128,128],f16>, !torch.vtensor<[512,512,3,3],f16>, !torch.vtensor<[512],f32>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int -> !torch.vtensor<[1,512,128,128],f32>

    %int5 = torch.constant.int 5
    %cast4 = torch.aten.to.dtype %4, %int5, %false, %false, %none : !torch.vtensor<[1,512,128,128],f32>, !torch.int, !torch.bool, !torch.bool, !torch.none -> !torch.vtensor<[1,512,128,128],f16>

    return %cast4 : !torch.vtensor<[1,512,128,128],f16>
  }
}

It's also possible that this is only a problem on MI300 (I was not verifying this on there), so it could make sense to go through the same motions on the MI300 server.

Eliasj42 commented 8 months ago

@PhaneeshB I'm getting this error when I try to run in fp16 for matmul nodes and the adds following the matmul nodes

LOADING FP16 INPUTS FOR INFERENCE
Loaded tensor from file: example_input_1x4x128x128_f16.pt
23:12:34.919991 : transformed_f Saved!
23:12:35.484727 : graphmodule_exported_traced Saved!
23:21:14.150773 : graphModule OUTPUT COMPUTED!
23:21:18.433906 : fx_importer_module Saved!
Saved to stable_diffusion_xl_base_1_0_1024x1024_fp16_vae_decode_rocm.mlir
Compiling to rocm with flags: ['--iree-hal-target-backends=rocm', '--iree-rocm-target-chip=gfx940', '--iree-rocm-link-bc=true', '--verify=false', '--iree-codegen-llvmgpu-use-vector-distribution', '--iree-global-opt-only-sink-transposes=true', '--iree-global-opt-propagate-transposes=true', '--iree-opt-const-eval=false', '--iree-opt-outer-dim-concat=true', '--iree-rocm-bc-dir=/opt/rocm/amdgcn/bitcode', '--iree-vm-target-truncate-unsupported-floats', '--iree-llvmgpu-enable-prefetch=true', '--verify=false', '--iree-codegen-log-swizzle-tile=4', '--iree-codegen-winograd-use-forall', '--iree-preprocessing-pass-pipeline=builtin.module(iree-preprocessing-transpose-convolution-pipeline, iree-preprocessing-pad-to-intrinsics)', '--iree-codegen-transform-dialect-library=/home/eljoseph/SHARK-Turbine/models/turbine_models/custom_models/sdxl_inference/default_mfma_attn_spec.mlir']

/home/eljoseph/SHARK-Turbine/turbine-venv/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/eljoseph/SHARK-Turbine/turbine-venv/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
Traceback (most recent call last):
  File "/home/eljoseph/SHARK-Turbine/models/turbine_models/custom_models/sdxl_inference/vae.py", line 189, in <module>
    mod_str = export_vae_model(
  File "/home/eljoseph/SHARK-Turbine/models/turbine_models/custom_models/sdxl_inference/vae.py", line 162, in export_vae_model
    vmfb_path = utils.compile_to_vmfb(
  File "/home/eljoseph/SHARK-Turbine/models/turbine_models/custom_models/sd_inference/utils.py", line 147, in compile_to_vmfb
    flatbuffer_blob = ireec.compile_str(
  File "/home/eljoseph/iree-build/compiler/bindings/python/iree/compiler/tools/core.py", line 299, in compile_str
    result = invoke_immediate(cl, immediate_input=input_bytes)
  File "/home/eljoseph/iree-build/compiler/bindings/python/iree/compiler/tools/binaries.py", line 198, in invoke_immediate
    raise CompilerToolError(process)
iree.compiler.tools.binaries.CompilerToolError: Error invoking IREE compiler tool iree-compile
Error code: -6
Diagnostics:
LLVM ERROR: Invalid layout assignment
Please report issues to https://github.com/openxla/iree/issues and include the crash backtrace.
 #0 0x00007f0cc0af7e77 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/eljoseph/iree/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:13
 #1 0x00007f0cc0af60a0 llvm::sys::RunSignalHandlers() /home/eljoseph/iree/third_party/llvm-project/llvm/lib/Support/Signals.cpp:106:18
 #2 0x00007f0cc0af853a SignalHandler(int) /home/eljoseph/iree/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:413:1
 #3 0x00007f0cba642520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007f0cba6969fc __pthread_kill_implementation ./nptl/./nptl/pthread_kill.c:44:76
 #5 0x00007f0cba6969fc __pthread_kill_internal ./nptl/./nptl/pthread_kill.c:78:10
 #6 0x00007f0cba6969fc pthread_kill ./nptl/./nptl/pthread_kill.c:89:10
 #7 0x00007f0cba642476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x00007f0cba6287f3 abort ./stdlib/./stdlib/abort.c:81:7
 #9 0x00007f0cc0a7a7dc llvm::report_fatal_error(llvm::Twine const&, bool) /home/eljoseph/iree/third_party/llvm-project/llvm/lib/Support/ErrorHandling.cpp:125:5
#10 0x00007f0cc0a7a606 (/home/eljoseph/iree-build/compiler/bindings/python/iree/compiler/_mlir_libs/libIREECompiler.so+0x607a606)
#11 0x00007f0cc2f2837e (/home/eljoseph/iree-build/compiler/bindings/python/iree/compiler/_mlir_libs/libIREECompiler.so+0x852837e)
#12 0x00007f0cc2f24d93 DistributionLayout::doResolution(mlir::iree_compiler::IREE::VectorExt::VectorLayoutInterface const&) /home/eljoseph/iree/compiler/src/iree/compiler/Codegen/Common/VectorLayoutAnalysis.cpp:0:5
#13 0x00007f0cc2f24d93 DistributionLayout::resolve(mlir::iree_compiler::IREE::VectorExt::VectorLayoutInterface const&) /home/eljoseph/iree/compiler/src/iree/compiler/Codegen/Common/VectorLayoutAnalysis.cpp:265:29
#14 0x00007f0cc2f26401 PropagateLayout::initialize(mlir::Operation*) /home/eljoseph/iree/compiler/src/iree/compiler/Codegen/Common/VectorLayoutAnalysis.cpp:737:5
#15 0x00007f0cc47d4d4d mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#16 0x00007f0cc47d4d4d mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#17 0x00007f0cc47d4d4d mlir::DataFlowSolver::initializeAndRun(mlir::Operation*) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Analysis/DataFlowFramework.cpp:93:9
#18 0x00007f0cc2f2796f mlir::iree_compiler::VectorLayoutAnalysis::run() /home/eljoseph/iree/compiler/src/iree/compiler/Codegen/Common/VectorLayoutAnalysis.cpp:994:3
#19 0x00007f0cc287378c mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#20 0x00007f0cc287378c mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#21 0x00007f0cc287378c mlir::iree_compiler::distributeVectorOps(mlir::Operation*, mlir::RewritePatternSet&, mlir::iree_compiler::VectorLayoutOptions&) /home/eljoseph/iree/compiler/src/iree/compiler/Codegen/Common/GPU/GPUVectorDistribution.cpp:236:7
#22 0x00007f0cc22b28d7 mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#23 0x00007f0cc22b28d7 mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#24 0x00007f0cc22b28d7 mlir::iree_compiler::(anonymous namespace)::LLVMGPUVectorDistributePass::runOnOperation() /home/eljoseph/iree/compiler/src/iree/compiler/Codegen/LLVMGPU/LLVMGPUVectorDistribute.cpp:380:9
#25 0x00007f0cc0c8b675 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7::operator()() const /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:0:17
#26 0x00007f0cc0c8b675 void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7>(long) /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#27 0x00007f0cc0c8b675 llvm::function_ref<void ()>::operator()() const /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#28 0x00007f0cc0c8b675 void mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(llvm::function_ref<void ()>, llvm::ArrayRef<mlir::IRUnit>, mlir::Pass&) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/IR/MLIRContext.h:275:7
#29 0x00007f0cc0c8b675 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:513:21
#30 0x00007f0cc0c8bdf8 mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#31 0x00007f0cc0c8bdf8 mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#32 0x00007f0cc0c8bdf8 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:585:9
#33 0x00007f0cc0c912c3 mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15::operator()(mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo&) const /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:810:5
#34 0x00007f0cc0c8d3cb mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#35 0x00007f0cc0c8d3cb mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#36 0x00007f0cc0c8d3cb mlir::LogicalResult mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/IR/Threading.h:46:11
#37 0x00007f0cc0c8d3cb mlir::LogicalResult mlir::failableParallelForEach<std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> >&, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&>(mlir::MLIRContext*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> >&, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/IR/Threading.h:92:10
#38 0x00007f0cc0c8d3cb mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:815:14
#39 0x00007f0cc0c8b810 mlir::detail::OpToOpPassAdaptor::runOnOperation(bool) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:0:5
#40 0x00007f0cc0c8b810 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7::operator()() const /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:517:20
#41 0x00007f0cc0c8b810 void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7>(long) /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#42 0x00007f0cc0c8b810 llvm::function_ref<void ()>::operator()() const /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#43 0x00007f0cc0c8b810 void mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(llvm::function_ref<void ()>, llvm::ArrayRef<mlir::IRUnit>, mlir::Pass&) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/IR/MLIRContext.h:275:7
#44 0x00007f0cc0c8b810 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:513:21
#45 0x00007f0cc0c8bdf8 mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#46 0x00007f0cc0c8bdf8 mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#47 0x00007f0cc0c8bdf8 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:585:9
#48 0x00007f0cc0c912c3 mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15::operator()(mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo&) const /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:810:5
#49 0x00007f0cc0c8d3cb mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#50 0x00007f0cc0c8d3cb mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#51 0x00007f0cc0c8d3cb mlir::LogicalResult mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/IR/Threading.h:46:11
#52 0x00007f0cc0c8d3cb mlir::LogicalResult mlir::failableParallelForEach<std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> >&, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&>(mlir::MLIRContext*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> >&, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/IR/Threading.h:92:10
#53 0x00007f0cc0c8d3cb mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:815:14
#54 0x00007f0cc0c8b810 mlir::detail::OpToOpPassAdaptor::runOnOperation(bool) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:0:5
#55 0x00007f0cc0c8b810 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7::operator()() const /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:517:20
#56 0x00007f0cc0c8b810 void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7>(long) /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#57 0x00007f0cc0c8b810 llvm::function_ref<void ()>::operator()() const /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#58 0x00007f0cc0c8b810 void mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(llvm::function_ref<void ()>, llvm::ArrayRef<mlir::IRUnit>, mlir::Pass&) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/IR/MLIRContext.h:275:7
#59 0x00007f0cc0c8b810 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:513:21
#60 0x00007f0cc0c8bdf8 mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#61 0x00007f0cc0c8bdf8 mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#62 0x00007f0cc0c8bdf8 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:585:9
#63 0x00007f0cc0c90691 mlir::LogicalResult llvm::function_ref<mlir::LogicalResult (mlir::OpPassManager&, mlir::Operation*)>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_6>(long, mlir::OpPassManager&, mlir::Operation*) /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:5
#64 0x00007f0cc22cba4c mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#65 0x00007f0cc22cba4c mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#66 0x00007f0cc22cba4c mlir::iree_compiler::(anonymous namespace)::LLVMGPULowerExecutableTargetPass::runOnOperation() /home/eljoseph/iree/compiler/src/iree/compiler/Codegen/LLVMGPU/LLVMGPULowerExecutableTarget.cpp:159:7
#67 0x00007f0cc0c8b675 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7::operator()() const /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:0:17
#68 0x00007f0cc0c8b675 void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7>(long) /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#69 0x00007f0cc0c8b675 llvm::function_ref<void ()>::operator()() const /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#70 0x00007f0cc0c8b675 void mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(llvm::function_ref<void ()>, llvm::ArrayRef<mlir::IRUnit>, mlir::Pass&) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/IR/MLIRContext.h:275:7
#71 0x00007f0cc0c8b675 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:513:21
#72 0x00007f0cc0c8bdf8 mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#73 0x00007f0cc0c8bdf8 mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#74 0x00007f0cc0c8bdf8 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:585:9
#75 0x00007f0cc0c90691 mlir::LogicalResult llvm::function_ref<mlir::LogicalResult (mlir::OpPassManager&, mlir::Operation*)>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_6>(long, mlir::OpPassManager&, mlir::Operation*) /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:5
#76 0x00007f0cc1eebdd2 mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#77 0x00007f0cc1eebdd2 mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#78 0x00007f0cc1eebdd2 mlir::iree_compiler::IREE::HAL::(anonymous namespace)::TranslateTargetExecutableVariantsPass::runOnOperation() /home/eljoseph/iree/compiler/src/iree/compiler/Dialect/HAL/Transforms/TranslateExecutables.cpp:65:9
#79 0x00007f0cc0c8b675 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7::operator()() const /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:0:17
#80 0x00007f0cc0c8b675 void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7>(long) /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#81 0x00007f0cc0c8b675 llvm::function_ref<void ()>::operator()() const /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#82 0x00007f0cc0c8b675 void mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(llvm::function_ref<void ()>, llvm::ArrayRef<mlir::IRUnit>, mlir::Pass&) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/IR/MLIRContext.h:275:7
#83 0x00007f0cc0c8b675 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:513:21
#84 0x00007f0cc0c8bdf8 mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#85 0x00007f0cc0c8bdf8 mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#86 0x00007f0cc0c8bdf8 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:585:9
#87 0x00007f0cc0c912c3 mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15::operator()(mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo&) const /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:810:5
#88 0x00007f0cc0c8d3cb mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#89 0x00007f0cc0c8d3cb mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#90 0x00007f0cc0c8d3cb mlir::LogicalResult mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/IR/Threading.h:46:11
#91 0x00007f0cc0c8d3cb mlir::LogicalResult mlir::failableParallelForEach<std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> >&, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&>(mlir::MLIRContext*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> >&, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/IR/Threading.h:92:10
#92 0x00007f0cc0c8d3cb mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:815:14
#93 0x00007f0cc0c8b810 mlir::detail::OpToOpPassAdaptor::runOnOperation(bool) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:0:5
#94 0x00007f0cc0c8b810 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7::operator()() const /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:517:20
#95 0x00007f0cc0c8b810 void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7>(long) /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#96 0x00007f0cc0c8b810 llvm::function_ref<void ()>::operator()() const /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#97 0x00007f0cc0c8b810 void mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(llvm::function_ref<void ()>, llvm::ArrayRef<mlir::IRUnit>, mlir::Pass&) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/IR/MLIRContext.h:275:7
#98 0x00007f0cc0c8b810 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:513:21
#99 0x00007f0cc0c8bdf8 mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#100 0x00007f0cc0c8bdf8 mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#101 0x00007f0cc0c8bdf8 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:585:9
#102 0x00007f0cc0c90691 mlir::LogicalResult llvm::function_ref<mlir::LogicalResult (mlir::OpPassManager&, mlir::Operation*)>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_6>(long, mlir::OpPassManager&, mlir::Operation*) /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:5
#103 0x00007f0cc1eec8ca mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#104 0x00007f0cc1eec8ca mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#105 0x00007f0cc1eec8ca mlir::iree_compiler::IREE::HAL::(anonymous namespace)::TranslateExecutablesPass::runOnOperation() /home/eljoseph/iree/compiler/src/iree/compiler/Dialect/HAL/Transforms/TranslateExecutables.cpp:105:9
#106 0x00007f0cc0c8b675 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7::operator()() const /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:0:17
#107 0x00007f0cc0c8b675 void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7>(long) /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#108 0x00007f0cc0c8b675 llvm::function_ref<void ()>::operator()() const /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#109 0x00007f0cc0c8b675 void mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(llvm::function_ref<void ()>, llvm::ArrayRef<mlir::IRUnit>, mlir::Pass&) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/IR/MLIRContext.h:275:7
#110 0x00007f0cc0c8b675 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:513:21
#111 0x00007f0cc0c8bdf8 mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#112 0x00007f0cc0c8bdf8 mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#113 0x00007f0cc0c8bdf8 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:585:9
#114 0x00007f0cc0c912c3 mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15::operator()(mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo&) const /home/eljoseph/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:810:5
#115 0x00007f0cc0c9137f mlir::LogicalResult::failed() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#116 0x00007f0cc0c9137f mlir::failed(mlir::LogicalResult) /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#117 0x00007f0cc0c9137f mlir::LogicalResult mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&)::'lambda'()::operator()() const /home/eljoseph/iree/third_party/llvm-project/mlir/include/mlir/IR/Threading.h:62:11
#118 0x00007f0cc0c9137f __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > > std::__invoke_impl<void, mlir::LogicalResult mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&)::'lambda'()&>(std::__invoke_other, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:61:14
#119 0x00007f0cc0c9137f std::enable_if<is_invocable_r_v<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&>, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > > >::type std::__invoke_r<void, mlir::LogicalResult mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&)::'lambda'()&>(mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:111:2
#120 0x00007f0cc0c9137f std::_Function_handler<void (), mlir::LogicalResult mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&)::'lambda'()>::_M_invoke(std::_Any_data const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_function.h:290:9
#121 0x00007f0cc0bf3bd8 std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<std::function<void ()> > >, void>::operator()() const /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/future:1440:20
#122 0x00007f0cc0bf3bd8 std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter> std::__invoke_impl<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<std::function<void ()> > >, void>&>(std::__invoke_other, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<std::function<void ()> > >, void>&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:61:14
#123 0x00007f0cc0bf3bd8 std::enable_if<is_invocable_r_v<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<std::function<void ()> > >, void>&>, std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> >::type std::__invoke_r<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<std::function<void ()> > >, void>&>(std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<std::function<void ()> > >, void>&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:114:9
#124 0x00007f0cc0bf3bd8 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<std::function<void ()> > >, void> >::_M_invoke(std::_Any_data const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_function.h:290:9
#125 0x00007f0cc0bf3b37 std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/future:591:13
#126 0x00007f0cba699ee8 __pthread_once_slow ./nptl/./nptl/pthread_once.c:118:7
#127 0x00007f0cc0bf3ef1 void std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/mutex:859:15
#128 0x00007f0cc0bf3ef1 std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/future:426:2
#129 0x00007f0cc0bf3ef1 std::__future_base::_Deferred_state<std::thread::_Invoker<std::tuple<std::function<void ()> > >, void>::_M_complete_async() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/future:1703:9
#130 0x00007f0cc0bf3f95 std::__atomic_base<unsigned int>::load(std::memory_order) const /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/atomic_base.h:488:9
#131 0x00007f0cc0bf3f95 std::__atomic_futex_unsigned<2147483648u>::_M_load(std::memory_order) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/atomic_futex.h:86:22
#132 0x00007f0cc0bf3f95 std::__atomic_futex_unsigned<2147483648u>::_M_load_when_equal(unsigned int, std::memory_order) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/atomic_futex.h:208:22
#133 0x00007f0cc0bf3f95 std::__future_base::_State_baseV2::wait() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/future:351:12
#134 0x00007f0cc0bf3f95 std::__basic_future<void>::wait() const /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/future:714:19
#135 0x00007f0cc0bf3f95 std::shared_future<void> llvm::ThreadPoolInterface::asyncImpl<void>(std::function<void ()>, llvm::ThreadPoolTaskGroup*)::'lambda'()::operator()() const /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/Support/ThreadPool.h:114:38
#136 0x00007f0cc0bf3f95 void std::__invoke_impl<void, std::shared_future<void> llvm::ThreadPoolInterface::asyncImpl<void>(std::function<void ()>, llvm::ThreadPoolTaskGroup*)::'lambda'()&>(std::__invoke_other, std::shared_future<void> llvm::ThreadPoolInterface::asyncImpl<void>(std::function<void ()>, llvm::ThreadPoolTaskGroup*)::'lambda'()&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:61:14
#137 0x00007f0cc0bf3f95 std::enable_if<is_invocable_r_v<void, std::shared_future<void> llvm::ThreadPoolInterface::asyncImpl<void>(std::function<void ()>, llvm::ThreadPoolTaskGroup*)::'lambda'()&>, void>::type std::__invoke_r<void, std::shared_future<void> llvm::ThreadPoolInterface::asyncImpl<void>(std::function<void ()>, llvm::ThreadPoolTaskGroup*)::'lambda'()&>(std::shared_future<void> llvm::ThreadPoolInterface::asyncImpl<void>(std::function<void ()>, llvm::ThreadPoolTaskGroup*)::'lambda'()&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:111:2
#138 0x00007f0cc0bf3f95 std::_Function_handler<void (), std::shared_future<void> llvm::ThreadPoolInterface::asyncImpl<void>(std::function<void ()>, llvm::ThreadPoolTaskGroup*)::'lambda'()>::_M_invoke(std::_Any_data const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_function.h:290:9
#139 0x00007f0cc0ab4807 llvm::StdThreadPool::processTasks(llvm::ThreadPoolTaskGroup*) /home/eljoseph/iree/third_party/llvm-project/llvm/lib/Support/ThreadPool.cpp:103:5
#140 0x00007f0cc0ab5ccc std::default_delete<std::tuple<llvm::StdThreadPool::grow(int)::$_0> >::operator()(std::tuple<llvm::StdThreadPool::grow(int)::$_0>*) const /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/unique_ptr.h:95:2
#141 0x00007f0cc0ab5ccc std::unique_ptr<std::tuple<llvm::StdThreadPool::grow(int)::$_0>, std::default_delete<std::tuple<llvm::StdThreadPool::grow(int)::$_0> > >::~unique_ptr() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/unique_ptr.h:396:4
#142 0x00007f0cc0ab5ccc void llvm::thread::GenericThreadProxy<std::tuple<llvm::StdThreadPool::grow(int)::$_0> >(void*) /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/Support/thread.h:46:3
#143 0x00007f0cc0ab5ccc void* llvm::thread::ThreadProxy<std::tuple<llvm::StdThreadPool::grow(int)::$_0> >(void*) /home/eljoseph/iree/third_party/llvm-project/llvm/include/llvm/Support/thread.h:55:5
#144 0x00007f0cba694ac3 start_thread ./nptl/./nptl/pthread_create.c:442:8
#145 0x00007f0cba726a40 ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:83:0

Invoked with:
 iree-compile /home/eljoseph/iree-build/compiler/bindings/python/iree/compiler/tools/../_mlir_libs/iree-compile - --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=rocm --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=rocm --iree-rocm-target-chip=gfx940 --iree-rocm-link-bc=true --verify=false --iree-codegen-llvmgpu-use-vector-distribution --iree-global-opt-only-sink-transposes=true --iree-global-opt-propagate-transposes=true --iree-opt-const-eval=false --iree-opt-outer-dim-concat=true --iree-rocm-bc-dir=/opt/rocm/amdgcn/bitcode --iree-vm-target-truncate-unsupported-floats --iree-llvmgpu-enable-prefetch=true --verify=false --iree-codegen-log-swizzle-tile=4 --iree-codegen-winograd-use-forall --iree-preprocessing-pass-pipeline=builtin.module(iree-preprocessing-transpose-convolution-pipeline, iree-preprocessing-pad-to-intrinsics) --iree-codegen-transform-dialect-library=/home/eljoseph/SHARK-Turbine/models/turbine_models/custom_models/sdxl_inference/default_mfma_attn_spec.mlir

Need more information? Set IREE_SAVE_TEMPS=/some/dir in your environment to save all artifacts and reproducers.

Command exited with non-zero status 1
558.60user 19.15system 8:50.08elapsed 108%CPU (0avgtext+0avgdata 1357920maxresident)k
0inputs+148408outputs (114major+936961minor)pagefaults 0swaps

Have you seen this before?

PhaneeshB commented 8 months ago

Update on https://github.com/nod-ai/SHARK-Turbine/issues/519#issuecomment-2005849813

Context : when only running upto the first matmul in VAE (by slicing the fx graph) we see there is a crash when compiling the mlir with IREE (crash stack in the comment above) LLVM ERROR: Invalid layout assignment

This is crash is only observed when compiling the module with --iree-codegen-llvmgpu-use-vector-distribution without the flag we are able to compile (and execute).

the op before this (first) matmul in the fx graph is a transpose op (which is also an input to the matmul) the output of transpose is 100% matching with the pytorch fp16 output with max error : 0.0019 Another input to the matmul is a constant loaded from the IR

The crash happens (as seen in the crash stack) when trying to propagate the vector layout working on narrowing down to the particular operand of the op


Another issue is - when compiling and running without --iree-codegen-llvmgpu-use-vector-distribution the max error for the matmul output jumps to : 0.015