Open monorimet opened 9 months ago
I got some ideas today, and will take a look at details tomorrow. It seems to happen in attention op on f16 types. We do have some numerical issues in f16 power approximation, see https://github.com/openxla/iree/issues/15936. We could have similar issues in f16 exp approximation.
The workaround is running ExpandF16OpToF32Pass
on some f16 ops, though I hope to remove the pass for a long time. The decomposition happens after f16->f32 conversion, so there could be problems in polynomial approximation. I'll take a look at this, and see if this is the same issue in https://github.com/openxla/iree/issues/16544
https://github.com/openxla/iree/pull/16577 should address the issue.
All the patches are landed to IREE. @monorimet could you verify if this is fixed when you're available?
What happened?
The SDXL 1.0 1024x1024 UNet e2e inference outputs NaN values for CPU backend. This is with iree_linalg_ext.attention tiled and decomposed.
In my dispatch breakdown, I found that:
hal compiled_unet_main_dispatch_131.mlir.txt = flow dispatch 184 -> starts outputting zeroes
hal compiled_unet_main_dispatch_113.mlir.txt = flow dispatch 186 -> outputs all NaNs
With this UNet .mlir file: stable_diffusion_xl_base_1_0_64_1024x1024_fp16_unet.mlir
And this weights file: SDXL1_unet.safetensors
The numerics issue should be reproducible by the following instructions.
Steps to reproduce your issue
Download files.
Compile for NaN output reproducing:
Run:
Check unet_out.txt and see NaN values
Compile for zeroes output reproducing:
Repeat 3, 4 to see mostly zero output (could be red herring? seems to be the first time any outputs are mostly zero, next flow dispatch is all zeroes, and then we are at NaNs the dispatch after that (which we repro'd first)
What component(s) does this issue relate to?
No response
Version information
Latest IREE (https://github.com/openxla/iree/commit/5d8907e82fc1eb741a4d4d27f5cae865323fd1d7)
(Notably enabled by https://github.com/openxla/iree/commit/946375cad71786462bcfd63dde6fe305d1e3b9ff)
Additional context
No response