Closed powderluv closed 1 month ago
What config and compiler flags should be used for each model? There looks to be several available in https://github.com/nod-ai/SHARK/tree/main/apps/stable_diffusion/src/utils/resources. I currently have it defaulting to "CompVis/stable-diffusion-v1-4" with no additional compiler flags.
I'm also running into this error for VAE:
Failed to import model SD_VAE_MODEL. Exception: Lowering Torch Backend IR -> Linalg-on-Tensors Backend IR failed with the following diagnostics:
error: failed to legalize operation 'torch.aten.convolution' that was explicitly marked illegal
note: see current operation: %173 = "torch.aten.convolution"(%arg0, %157, %156, %171, %172, %171, %2, %172, %17) : (!torch.vtensor<[1,1,4,64,64],f32>, !torch.vtensor<[4,4,1,1],f32>, !torch.vtensor<[4],f32>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int) -> !torch.vtensor<[1,4,4,64,?],f32>
Another question: Is the sequence length for ClipTextModel
fixed or variable? If variable, what is the typical sequence length that we should set for the regression?
We use 64 and 77 for clip.
@monorimet can you please share the compile flags for vulkan rdna2 / rnda3 and cuda for the SD models ?
Removing "awaiting-triage" since this has both priority and assignee.
Here are the commands that have worked for me and should match what the SHARK SD apps do during compilation. UNet ( rdna2 ):
iree-compile --iree-input-type=none --iree-hal-target-backends=vulkan --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs -iree-vulkan-target-triple=rdna2-unknown-linux --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-preprocessing-convert-conv2d-to-img2col,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-pad-linalg-ops{pad-size=32}))' unet.mlir -o unet.vmfb
VAE ( rdna2 ):
iree-compile --iree-input-type=none --iree-hal-target-backends=vulkan --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs -iree-vulkan-target-triple=rdna2-unknown-linux --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-preprocessing-convert-conv2d-to-img2col,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-pad-linalg-ops{pad-size=32},iree-linalg-ext-convert-conv2d-to-winograd))' vae.mlir -o vae.vmfb
Clip ( rdna2 ):
iree-compile --iree-input-type=none --iree-hal-target-backends=vulkan --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs -iree-vulkan-target-triple=rdna2-unknown-linux --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-preprocessing-pad-linalg-ops{pad-size=32}))' clip.mlir -o clip.vmfb
UNet ( rdna3 ):
iree-compile --iree-input-type=none --iree-hal-target-backends=vulkan --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs -iree-vulkan-target-triple=rdna3-7900-linux --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-preprocessing-convert-conv2d-to-img2col,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-pad-linalg-ops{pad-size=32}))' unet.mlir -o unet.vmfb
VAE ( rdna3 ):
iree-compile --iree-input-type=none --iree-hal-target-backends=vulkan --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs -iree-vulkan-target-triple=rdna3-7900-linux --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-preprocessing-convert-conv2d-to-img2col,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-pad-linalg-ops{pad-size=32},iree-linalg-ext-convert-conv2d-to-winograd))' vae.mlir -o vae.vmfb
Clip ( rdna3 ):
iree-compile --iree-input-type=none --iree-hal-target-backends=vulkan --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs -iree-vulkan-target-triple=rdna3-7900-linux --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-pad-linalg-ops{pad-size=32}))' clip.mlir -o clip.vmfb
UNet ( cuda ):
iree-compile --iree-input-type=none --iree-hal-target-backends=cuda --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs unet.mlir -o unet.vmfb
VAE ( cuda ):
iree-compile --iree-input-type=none --iree-hal-target-backends=cuda --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs vae.mlir -o vae.vmfb
Clip ( cuda ):
iree-compile --iree-input-type=none --iree-hal-target-backends=cuda --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs clip.mlir -o clip.vmfb
Let me know if these work for you.
@mariecwhite Can you update this one?
Sorry, I didn't see Ean's comment. Clip and UNet are in CI and running for CUDA. VAE was working but now regressed. I also need to add support for running the models with Vulkan/Spirv.
Thanks for adding Vulkan benchmarks @antiagainst ! For VAE, there seems to be an issue with the model: https://github.com/openxla/iree/issues/12771
Unassigning myself from issues that I'm not actively working on
Assigning this to @antiagainst since he picked up this task.
The only missing one is VAE I think, which is pending in #12767. Any chance we move that forward, @mariecwhite?
obsolete
Request description
This is to prevent issues like https://github.com/openxla/iree/issues/12634. It would be good to compile for CUDA, vulkan rdna2, vulkan rdna3 (wmma). Ideally we can run a simple run-module / benchmark-module for those three on an NVidia gpu.
What component(s) does this issue relate to?
Other
Additional context
No response