iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.56k stars 571 forks source link

[Flow][SDXL] Numerics different with vs. without aggressive fusion on SDXL #18157

Open Max191 opened 1 month ago

Max191 commented 1 month ago

Running SDXL int8 with aggressive fusion enabled produces different results from running without aggressive fusion enabled.

Repro Instructions

  1. Checkout https://github.com/iree-org/iree/tree/shared/sdxl_quantized in IREE
  2. Clone https://github.com/nod-ai/sdxl-scripts and cd sdxl-scripts/int8-model
  3. run ./compile-punet.sh gfx942 and
    iree-run-module   --device=hip://0   --hip_use_streams=true   --hip_allow_inline_execution=true   --device_allocator=caching   --module=tmp/punet.vmfb   --parameters=model=/data/shark/sdxl_unet_int8_dataset.irpa   --function=main   --input=1x4x128x128xf16=1.0   --input=1xsi32=1   --input=2x64x2048xf16=1.0   --input=2x1280xf16=1.0   --input=2x6xf16=1.0   --input=1xf16=1.0 --output=@out_default.npy
  4. Remove --iree-flow-enable-aggressive-fusion from compile-punet-base.sh
  5. run ./compile-punet.sh gfx942 and
    iree-run-module   --device=hip://0   --hip_use_streams=true   --hip_allow_inline_execution=true   --device_allocator=caching   --module=tmp/punet.vmfb   --parameters=model=/data/shark/sdxl_unet_int8_dataset.irpa   --function=main   --input=1x4x128x128xf16=1.0   --input=1xsi32=1   --input=2x64x2048xf16=1.0   --input=2x1280xf16=1.0   --input=2x6xf16=1.0   --input=1xf16=1.0 --output=@out_no_aggressive_fusion.npy
  6. Compare the results in out_default.npy vs. out_no_aggressive_fusion.npy:
    
    import numpy as np

a= np.load("out_default.npy") b= np.load("out_no_aggressive_fusion.npy")

diff = a- b

print(diff) print(np.max(diff))



Max diff between output tensors is `0.2993`
MaheshRavishankar commented 1 month ago

It most definitely should be a codegen issue.... If we can narrow it down to the diverging dispatch that would help.

Max191 commented 1 month ago

It most definitely should be a codegen issue.... If we can narrow it down to the diverging dispatch that would help.

I can spend some time to bisect it.

IanWood1 commented 1 month ago

It most definitely should be a codegen issue.... If we can narrow it down to the diverging dispatch that would help.

I can spend some time to bisect it.

I can take a look

MaheshRavishankar commented 1 month ago

@Max191 and @IanWood1 thanks for volunteering. Maybe @IanWood1 can look into it for now. Please ask if you need help

IanWood1 commented 1 month ago

The numerical differences originate from a linalg.conv_2d_nhwc_hwcf op followed by a linalg.generic

Min Repro Instructions

  1. Checkout and build https://github.com/iree-org/iree/tree/shared/sdxl_quantized in IREE
  2. Clone https://github.com/nod-ai/sdxl-scripts and cd sdxl-scripts/int8-model
  3. Download repro function https://gist.github.com/IanWood1/e720221e3e1cc9384800a6527b0bdb12
  4. Create a npy file for each input. IMPORTANT: there were no numerical differences when using splatted inputs, hence why npy files were used.
  5. Compile a vmfb both with/without aggressive fusion using the same flags as compile-punet.sh and run against the inputs
  6. Compare the outputs
Max191 commented 1 month ago

Thanks Ian for narrowing it down!

I have reduced the repro a bit more to a difference in vector distribution. Running with aggressive fusion with vs. without vector distribution produces the same numerical precision errors.

Reproducing

Follow Ian's steps above, but replace the scripts from (5) and (6) with https://gist.github.com/Max191/9bdedc086a12a0b314cbd98be2d15450 and https://gist.github.com/Max191/cc70f92399879d05231c448252f8d62d

This error also reproduces on IREE main without tuning, so there is no longer a need to use https://github.com/iree-org/iree/tree/shared/sdxl_quantized or https://github.com/nod-ai/sdxl-scripts

Max191 commented 1 month ago

@MaheshRavishankar any ideas on who could investigate this issue? It seems that it is related to vector distribution now, so ideally I think someone who knows that pipeline better should pick this up, but I know that the folks who know about it are pretty busy right now.