nod-ai / sdxl-scripts

Apache License 2.0
2 stars 6 forks source link

SDXL IRs and Scripts

SDXL end-to-end benchmarking

  1. Checkout and compile IREE with release build and export PATH=/path/to/iree/build/release/tools:$PATH
  2. Compile the full SDXL model: ./compile-txt2img.sh gfx942 (where gfx942 is the target for MI300X)
  3. Run the benchmark: ./benchmark-txt2img.sh N /path/to/weights/irpa (where N is the GPU index)

Model IRs and weights

[!CAUTION] IRs in the following table might be stale. Use the ones in the base_ir/ directory instead.

[!NOTE] SDXL-turbo is only different from SDXL in its usage and training/weights. The model architecture (and therefore the weights-stripped MLIR) are equivalent.

Variant Submodel MLIR (No Weights) (Config A) safetensors Splat IRPA MLIR (No Weights) (Config B)
SDXL1.0 1024x1024 (f16, BS1, len64)
UNet + attn Torch - Linalg - - Azure
UNet + PNDMScheduler Azure
Clip1 Azure - -
Clip2 Azure - -
VAE decode + attn Azure - = Azure
VAE encode + attn [GCloud][sdxl-1-1024x1024-f16-stripped-weight-vae-encode] Same as decode - -
SDXL1.0 1024x1024 (f32, BS1, len64)
UNet + attn Azure Azure Azure Azure
Clip1 Azure Azure Azure -
Clip2 Azure Azure Azure -
VAE decode + attn Azure Azure Azure Azure
SDXL compiled pipeline IRPAs (f16)
UNet scheduled_unet_f16.irpa
Prompt Encoder (CLIP1 + CLIP2) prompt_encoder_f16.irpa
VAE vae_decode_f16.irpa