nod-ai / SHARK-Studio

SHARK Studio -- Web UI for SHARK+IREE High Performance Machine Learning Distribution
Apache License 2.0
1.42k stars 172 forks source link

AMD Rocm windows does not work - hipErrorSharedObjectInitFailed #2119

Open vasicvuk opened 7 months ago

vasicvuk commented 7 months ago

Installed latest version of AMD drivers. Graphics card is: 7900 XTX

No vmfb found. Compiling and saving to D:\nodeai shark\euler_scale_model_input_1_512_512_rocm_fp16.vmfb
Configuring for device:rocm://0
could not execute `iree-run-module --dump_devices=rocm`
Did not find ROCm architecture from `--iree-rocm-target-chip` flag
 or from `iree-run-module --dump_devices=rocm` command.
Using gfx1100 as ROCm arch for compilation.
Saved vmfb in D:\nodeai shark\euler_scale_model_input_1_512_512_rocm_fp16.vmfb.
Loading module D:\nodeai shark\euler_scale_model_input_1_512_512_rocm_fp16.vmfb...
Traceback (most recent call last):
  File "C:\Users\xxx\AppData\Local\Temp\_MEI61762\gradio\queueing.py", line 489, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxx\AppData\Local\Temp\_MEI61762\gradio\route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxx\AppData\Local\Temp\_MEI61762\gradio\blocks.py", line 1561, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxx\AppData\Local\Temp\_MEI61762\gradio\blocks.py", line 1191, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxx\AppData\Local\Temp\_MEI61762\gradio\utils.py", line 519, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxx\AppData\Local\Temp\_MEI61762\gradio\utils.py", line 512, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "anyio\to_thread.py", line 56, in run_sync
  File "anyio\_backends\_asyncio.py", line 2134, in run_sync_in_worker_thread
  File "anyio\_backends\_asyncio.py", line 851, in run
  File "C:\Users\xxx\AppData\Local\Temp\_MEI61762\gradio\utils.py", line 495, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "C:\Users\xxx\AppData\Local\Temp\_MEI61762\gradio\utils.py", line 666, in gen_wrapper
    yield from f(*args, **kwargs)
  File "ui\txt2img_ui.py", line 194, in txt2img_inf
  File "apps\stable_diffusion\src\schedulers\sd_schedulers.py", line 141, in get_schedulers
  File "apps\stable_diffusion\src\schedulers\shark_eulerdiscrete.py", line 147, in compile
  File "apps\stable_diffusion\src\schedulers\shark_eulerdiscrete.py", line 123, in _import
  File "apps\stable_diffusion\src\utils\utils.py", line 187, in compile_through_fx
  File "apps\stable_diffusion\src\utils\utils.py", line 84, in _compile_module
  File "shark\shark_inference.py", line 232, in load_module
    params = load_flatbuffer(
             ^^^^^^^^^^^^^^^^
  File "shark\iree_utils\compile_utils.py", line 517, in load_flatbuffer
    vmfb, config, temp_file_to_unlink = load_vmfb_using_mmap(
                                        ^^^^^^^^^^^^^^^^^^^^^
  File "shark\iree_utils\compile_utils.py", line 448, in load_vmfb_using_mmap
    ctx.add_vm_module(mmaped_vmfb)
  File "iree\runtime\system_api.py", line 271, in add_vm_module
  File "iree\runtime\system_api.py", line 268, in add_vm_modules
RuntimeError: Error registering modules: C:\actions-runner\w\SRT\SRT\c\experimental\rocm\status_util.c:31: INTERNAL; rocm driver error 'hipErrorSharedObjectInitFailed' (303): shared object initialization failed; while invoking native function hal.executable.create; while calling import;
[ 1]   native hal.executable.create:0 -
[ 0] bytecode module@1:284 -
rohit-mp commented 3 months ago

Similar issue, but it seems to be trying to access something from D drive when my system doesn't even have one.


local-sync devices are available.
local-task devices are available.
vulkan devices are available.
metal devices are not available.
cuda devices are not available.
hip devices are available.
Clearing .mlir temporary files from a prior run. This may take some time...
Clearing .mlir temporary files took 0.0000 seconds.
gradio temporary image cache located at C:\Users\rohit\Downloads\sd\shark_tmp\gradio. You may change this by setting the GRADIO_TEMP_DIR environment variable.
No temporary images files to clear.
gradio temporary image cache located at C:\Users\rohit\Downloads\sd\shark_tmp\gradio. You may change this by setting the GRADIO_TEMP_DIR environment variable.
No temporary images files to clear.
diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
Running on local URL:  http://0.0.0.0:8080

To create a public link, set `share=True` in `launch()`.
To create a public link, set `share=True` in `launch()`.

[LOG] Submitting Request...

[LOG] Initializing new pipeline...

[LOG] Pipeline initialized with pipe_id: stabilityai_stable-diffusion-2-1-base_1_64_512x512_fp16_gfx1100.

[LOG] Preparing pipeline...

Missing files: clip.vmfb, unet.vmfb, vae_decode.vmfb, clip.safetensors, unet.safetensors, vae_decode.safetensors

huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
Saved params to C:\Users\rohit\Downloads\sd\models\stabilityai_stable-diffusion-2-1-base_fp16\clip.safetensors

Compiling to rocm with flags: ['--iree-hal-target-backends=rocm', '--iree-rocm-target-chip=gfx1100', '--iree-opt-const-eval=false', '--iree-vm-bytecode-module-output-format=flatbuffer-binary', '--iree-global-opt-propagate-transposes=true', '--iree-opt-outer-dim-concat=true', '--iree-vm-target-truncate-unsupported-floats', '--iree-llvmgpu-enable-prefetch=true', '--iree-opt-data-tiling=false', '--iree-opt-aggressively-propagate-transposes=true', '--iree-flow-enable-aggressive-fusion', '--iree-global-opt-enable-fuse-horizontal-contractions=true', '--iree-codegen-gpu-native-math-precision=true', '--iree-codegen-llvmgpu-use-vector-distribution=true', '--iree-preprocessing-pass-pipeline=builtin.module(iree-preprocessing-transpose-convolution-pipeline, iree-global-opt-raise-special-ops, util.func(iree-preprocessing-pad-to-intrinsics))', '--iree-codegen-transform-dialect-library=C:\\Users\\rohit\\Downloads\\sd\\models\\stabilityai_stable-diffusion-2-1-base_1_64_512x512_fp16_gfx1100\\attention_and_matmul_spec_wmma.mlir']

Saved to C:\Users\rohit\Downloads\sd\models\stabilityai_stable-diffusion-2-1-base_1_64_512x512_fp16_gfx1100\clip.mlir

Saved to C:\Users\rohit\Downloads\sd\models\stabilityai_stable-diffusion-2-1-base_1_64_512x512_fp16_gfx1100\clip.vmfb

huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
Saved params to C:\Users\rohit\Downloads\sd\models\stabilityai_stable-diffusion-2-1-base_fp16\unet.safetensors

Compiling to rocm with flags: ['--iree-hal-target-backends=rocm', '--iree-rocm-target-chip=gfx1100', '--iree-opt-const-eval=false', '--iree-vm-bytecode-module-output-format=flatbuffer-binary', '--iree-global-opt-propagate-transposes=true', '--iree-opt-outer-dim-concat=true', '--iree-vm-target-truncate-unsupported-floats', '--iree-llvmgpu-enable-prefetch=true', '--iree-opt-data-tiling=false', '--iree-opt-aggressively-propagate-transposes=true', '--iree-flow-enable-aggressive-fusion', '--iree-global-opt-enable-fuse-horizontal-contractions=true', '--iree-codegen-gpu-native-math-precision=true', '--iree-codegen-llvmgpu-use-vector-distribution=true', '--iree-preprocessing-pass-pipeline=builtin.module(iree-preprocessing-transpose-convolution-pipeline, iree-global-opt-raise-special-ops, util.func(iree-preprocessing-pad-to-intrinsics))', '--iree-codegen-transform-dialect-library=C:\\Users\\rohit\\Downloads\\sd\\models\\stabilityai_stable-diffusion-2-1-base_1_64_512x512_fp16_gfx1100\\attention_and_matmul_spec_wmma.mlir']

Saved to C:\Users\rohit\Downloads\sd\models\stabilityai_stable-diffusion-2-1-base_1_64_512x512_fp16_gfx1100\unet.mlir

Saved to C:\Users\rohit\Downloads\sd\models\stabilityai_stable-diffusion-2-1-base_1_64_512x512_fp16_gfx1100\unet.vmfb

huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
Saved params to C:\Users\rohit\Downloads\sd\models\stabilityai_stable-diffusion-2-1-base_fp16\vae_decode.safetensors

Compiling to rocm with flags: ['--iree-hal-target-backends=rocm', '--iree-rocm-target-chip=gfx1100', '--iree-opt-const-eval=false', '--iree-vm-bytecode-module-output-format=flatbuffer-binary', '--iree-global-opt-propagate-transposes=true', '--iree-opt-outer-dim-concat=true', '--iree-vm-target-truncate-unsupported-floats', '--iree-llvmgpu-enable-prefetch=true', '--iree-opt-data-tiling=false', '--iree-opt-aggressively-propagate-transposes=true', '--iree-flow-enable-aggressive-fusion', '--iree-global-opt-enable-fuse-horizontal-contractions=true', '--iree-codegen-gpu-native-math-precision=true', '--iree-codegen-llvmgpu-use-vector-distribution=true', '--iree-preprocessing-pass-pipeline=builtin.module(iree-preprocessing-transpose-convolution-pipeline, iree-global-opt-raise-special-ops, util.func(iree-preprocessing-pad-to-intrinsics))', '--iree-codegen-transform-dialect-library=C:\\Users\\rohit\\Downloads\\sd\\models\\stabilityai_stable-diffusion-2-1-base_1_64_512x512_fp16_gfx1100\\attention_and_matmul_spec_wmma.mlir']

Saved to C:\Users\rohit\Downloads\sd\models\stabilityai_stable-diffusion-2-1-base_1_64_512x512_fp16_gfx1100\vae_decode.mlir

Saved to C:\Users\rohit\Downloads\sd\models\stabilityai_stable-diffusion-2-1-base_1_64_512x512_fp16_gfx1100\vae_decode.vmfb

All necessary files found.

[LOG] Loading pipeline to device rocm.

huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
Traceback (most recent call last):
  File "C:\Users\rohit\AppData\Local\Temp\_MEI66202\gradio\queueing.py", line 527, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\rohit\AppData\Local\Temp\_MEI66202\gradio\route_utils.py", line 270, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\rohit\AppData\Local\Temp\_MEI66202\gradio\blocks.py", line 1847, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\rohit\AppData\Local\Temp\_MEI66202\gradio\blocks.py", line 1445, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\rohit\AppData\Local\Temp\_MEI66202\gradio\utils.py", line 629, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\rohit\AppData\Local\Temp\_MEI66202\gradio\utils.py", line 622, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "anyio\to_thread.py", line 56, in run_sync
  File "anyio\_backends\_asyncio.py", line 2177, in run_sync_in_worker_thread
  File "anyio\_backends\_asyncio.py", line 859, in run
  File "C:\Users\rohit\AppData\Local\Temp\_MEI66202\gradio\utils.py", line 605, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "C:\Users\rohit\AppData\Local\Temp\_MEI66202\gradio\utils.py", line 788, in gen_wrapper
    response = next(iterator)
               ^^^^^^^^^^^^^^
  File "apps\shark_studio\api\sd.py", line 315, in shark_sd_fn_dict_input
  File "apps\shark_studio\api\sd.py", line 438, in shark_sd_fn
  File "apps\shark_studio\api\sd.py", line 244, in prepare_pipe
  File "turbine_models\custom_models\sd_inference\sd_pipeline.py", line 372, in load_pipeline
  File "turbine_models\model_runner.py", line 66, in __init__
  File "iree\runtime\system_api.py", line 191, in __init__
    self._vm_context = _binding.VmContext(
                       ^^^^^^^^^^^^^^^^^^^
RuntimeError: Error creating vm context with modules: D:\a\SRT\SRT\c\experimental\rocm\status_util.c:31: INTERNAL; rocm driver error 'hipErrorSharedObjectInitFailed' (303): shared object initialization failed; mismatched target chip? missing/wrong bitcode directory?; while invoking native function hal.executable.create; while calling import;
[ 1]   native hal.executable.create:0 -
[ 0] bytecode compiled_clip.__init:3500 [
    <stdin>:408:10
      at <stdin>:376:12,```