Open monorimet opened 5 months ago
I tried a few different configurations, and found a potentially useful runtime error when using inlined weights with SDXL:
Assertion failed: !!(iree_hal_resource_is(base_value, &iree_hal_rocm_buffer_vtable)), file C:\V\iree\experimental\rocm\rocm_buffer.c, line 25
That is what I get when running with HIP hal driver; using the ROCM driver works and gives same numerics as with externalized weights.
Could this be related to how we are using --iree-stream-resource-memory-model=unified
by default? I am trying with this flag set to discrete
now.
maybe you have it the other way around? the error you have says C:\V\iree\experimental\rocm\rocm_buffer.c
which is ROCM, not HIP
that kind of error will happen if the driver is casting a buffer pointer instead of the iree_hal_allocated_buffer()
result
I was a bit confused by this as well, but this was for sure run with HIP driver. Will validate with cli
if you're in a release LTO build it's possible the two functions are identical and got folded, but usually asserts and stuff prevent that - either way, good to test with a breakpoint or printf
OK, so if I switch from my local build, configured with:
cmake -GNinja -B ../iree-build --log-level=VERBOSE -DIREE_BUILD_PYTHON_BINDINGS=ON -DIREE_BUILD_COMPILER=ON -DPython3_EXECUTABLE=C:\\V\SHARK-Turbine\turb.env\Scripts\python.exe -DCMAKE_BUILD_TYPE=Release -DIREE_HAL_DRIVER_VULKAN=ON -DIREE_HAL_DRIVER_CUDA=OFF -DIREE_EXTERNAL_HAL_DRIVERS="rocm" -DIREE_ENABLE_CPUINFO=ON -DIREE_HAL_DRIVER_ROCM=ON -DIREE_ENABLE_LLD=ON -DIREE_ENABLE_RUNTIME_TRACING=OFF -DIREE_ENABLE_ASSERTIONS=ON -DIREE_ENABLE_SPLIT_DWARF=ON
to a recent pip install of iree-runtime, instead of giving an assertion on hip hal driver, it just starts completely freezing my system for minutes at a time. Will try with resnet again to see if it completes. This seems to happen with --iree-stream-resource-memory-model=unified
and --iree-stream-resource-memory-model=discrete
but I've only tried this with externalized weights. Will try with inlined.
Are the pip releases built with assertions disabled? It could explain this, if the driver is still casting the wrong pointer.
Does the workaround from here https://github.com/iree-org/iree/issues/17033 work to solve this issue as well?
What happened?
The same .vmfb gives different results on ROCM and HIP hal drivers. Caching allocator is being used on both, but this doesn't seem to make a difference if disabled.
Good numerics: ROCM with inlined weights gives correct output.
Bad numerics # 1: ROCM with external weights gives all zeroes output
Bad numerics # 2: HIP with inlined weights gives wrong numbers.
Bad numerics # 3: HIP with external weights gives wrong numbers.
I am filing this issue specifically for this target and IR because other targets and models do not reproduce the same success/failure cases. (see https://github.com/iree-org/iree/issues/17033)
The only reason I am including ROCM HAL results is because they contain the only success mode. We should focus on fixing HIP hal issues.
Full log output using turbine-models scripts -- I will provide iree CLI reproducers as well, but these are using fixed random inputs:
Steps to reproduce your issue
Artifacts:
MLIR (FP16): https://sharkpublic.blob.core.windows.net/sharkpublic/ean/hip_numerics/gfx1103_unet/numerics_debug_hip/stable_diffusion_xl_base_1_0_bs1_64_1024x1024_fp16_unet_rocm.mlir MLIR (FP32): https://sharkpublic.blob.core.windows.net/sharkpublic/ean/hip_numerics/gfx1103_unet/numerics_debug_hip/stable_diffusion_xl_base_1_0_bs1_64_1024x1024_fp32_unet_cpu.mlir WMMA spec: https://sharkpublic.blob.core.windows.net/sharkpublic/ean/hip_numerics/gfx1103_unet/numerics_debug_hip/attention_and_matmul_spec_wmma.mlir MLIR (inlined, fp16): https://sharkpublic.blob.core.windows.net/sharkpublic/ean/hip_numerics/gfx1103_unet/numerics_debug_hip/stable_diffusion_xl_base_1_0_bs1_64_1024x1024_fp16_unet_inline.mlir
inputs: https://sharkpublic.blob.core.windows.net/sharkpublic/ean/hip_numerics/gfx1103_unet/numerics_debug_hip/input1.npy https://sharkpublic.blob.core.windows.net/sharkpublic/ean/hip_numerics/gfx1103_unet/numerics_debug_hip/input2.npy https://sharkpublic.blob.core.windows.net/sharkpublic/ean/hip_numerics/gfx1103_unet/numerics_debug_hip/input3.npy https://sharkpublic.blob.core.windows.net/sharkpublic/ean/hip_numerics/gfx1103_unet/numerics_debug_hip/input4.npy https://sharkpublic.blob.core.windows.net/sharkpublic/ean/hip_numerics/gfx1103_unet/numerics_debug_hip/input5.npy https://sharkpublic.blob.core.windows.net/sharkpublic/ean/hip_numerics/gfx1103_unet/numerics_debug_hip/input6.npy
Weights: https://sharkpublic.blob.core.windows.net/sharkpublic/SDXL/SDXL_weights_fp16/scheduled_unet.irpa
Compile:
Run:
What component(s) does this issue relate to?
No response
Version information
IREE branch uses is shared/tresleches-united, but these issues historically reproduce on main branch, though all compile options here may not translate.
https://github.com/iree-org/iree/commit/c66ae1957bdb2d8dd20ef3d32e4a3ab715e87869 for exact commit.
Additional context
No response