[Flow][SDXL] Numerics different with vs. without aggressive fusion on SDXL

Max191 commented 1 month ago

Running SDXL int8 with aggressive fusion enabled produces different results from running without aggressive fusion enabled.

Repro Instructions

Checkout https://github.com/iree-org/iree/tree/shared/sdxl_quantized in IREE
Clone https://github.com/nod-ai/sdxl-scripts and cd sdxl-scripts/int8-model

run ./compile-punet.sh gfx942 and

iree-run-module   --device=hip://0   --hip_use_streams=true   --hip_allow_inline_execution=true   --device_allocator=caching   --module=tmp/punet.vmfb   --parameters=model=/data/shark/sdxl_unet_int8_dataset.irpa   --function=main   --input=1x4x128x128xf16=1.0   --input=1xsi32=1   --input=2x64x2048xf16=1.0   --input=2x1280xf16=1.0   --input=2x6xf16=1.0   --input=1xf16=1.0 --output=@out_default.npy

Remove --iree-flow-enable-aggressive-fusion from compile-punet-base.sh

run ./compile-punet.sh gfx942 and

iree-run-module   --device=hip://0   --hip_use_streams=true   --hip_allow_inline_execution=true   --device_allocator=caching   --module=tmp/punet.vmfb   --parameters=model=/data/shark/sdxl_unet_int8_dataset.irpa   --function=main   --input=1x4x128x128xf16=1.0   --input=1xsi32=1   --input=2x64x2048xf16=1.0   --input=2x1280xf16=1.0   --input=2x6xf16=1.0   --input=1xf16=1.0 --output=@out_no_aggressive_fusion.npy

Compare the results in out_default.npy vs. out_no_aggressive_fusion.npy:
```
import numpy as np
```

a= np.load("out_default.npy") b= np.load("out_no_aggressive_fusion.npy")

diff = a- b

print(diff) print(np.max(diff))



Max diff between output tensors is `0.2993`

MaheshRavishankar commented 1 month ago

It most definitely should be a codegen issue.... If we can narrow it down to the diverging dispatch that would help.

Max191 commented 1 month ago

It most definitely should be a codegen issue.... If we can narrow it down to the diverging dispatch that would help.

I can spend some time to bisect it.

IanWood1 commented 1 month ago

It most definitely should be a codegen issue.... If we can narrow it down to the diverging dispatch that would help.

I can spend some time to bisect it.

I can take a look

MaheshRavishankar commented 1 month ago

@Max191 and @IanWood1 thanks for volunteering. Maybe @IanWood1 can look into it for now. Please ask if you need help

IanWood1 commented 1 month ago

The numerical differences originate from a linalg.conv_2d_nhwc_hwcf op followed by a linalg.generic

Min Repro Instructions

Checkout and build https://github.com/iree-org/iree/tree/shared/sdxl_quantized in IREE
Clone https://github.com/nod-ai/sdxl-scripts and cd sdxl-scripts/int8-model
Download repro function https://gist.github.com/IanWood1/e720221e3e1cc9384800a6527b0bdb12
Create a npy file for each input. IMPORTANT: there were no numerical differences when using splatted inputs, hence why npy files were used.
- e.g. https://gist.github.com/IanWood1/a6916d7a4f3762a2776bce51325f37c1
Compile a vmfb both with/without aggressive fusion using the same flags as compile-punet.sh and run against the inputs
- e.g. https://gist.github.com/IanWood1/bf44ce18cc815587ec6606fd0230fcb2
Compare the outputs
- e.g. https://gist.github.com/IanWood1/b7df5fd238fac3f1bcb7b9a65843bf99

Max191 commented 1 month ago

Thanks Ian for narrowing it down!

I have reduced the repro a bit more to a difference in vector distribution. Running with aggressive fusion with vs. without vector distribution produces the same numerical precision errors.

Reproducing

Follow Ian's steps above, but replace the scripts from (5) and (6) with https://gist.github.com/Max191/9bdedc086a12a0b314cbd98be2d15450 and https://gist.github.com/Max191/cc70f92399879d05231c448252f8d62d

This error also reproduces on IREE main without tuning, so there is no longer a need to use https://github.com/iree-org/iree/tree/shared/sdxl_quantized or https://github.com/nod-ai/sdxl-scripts

Max191 commented 1 month ago

@MaheshRavishankar any ideas on who could investigate this issue? It seems that it is related to vector distribution now, so ideally I think someone who knows that pipeline better should pick this up, but I know that the folks who know about it are pretty busy right now.

iree-org / iree