Stable diffusion model fails on Mac

philloooo commented 1 month ago

Hi! I am trying to run https://microsoft.github.io/webnn-developer-preview/demos/stable-diffusion-1.5/ with HEAD of chromium and got

Failed to execute 'constant' on 'MLGraphBuilder': Unsupported data type int64 for constant, must be one of [float32, float16, int32].

ibelem commented 1 month ago

@philloooo Thanks for the report, I assume this is also caused by Safety Checker model, submitted PR in https://github.com/microsoft/webnn-developer-preview/pull/24

If you find any log mentioned this error happens for other SD 1.5 models (text encoder, unet or VAE decoder), please let us know.

ibelem commented 1 month ago

@philloooo The int32 Safety Checher model has been updated in https://github.com/microsoft/webnn-developer-preview/pull/29, please try on your side.

Please clearing site data through Settings -> Privacy -> Clear browsing data -> Cookies and other site data -> Clear if the model is not fully downloaded successfully or show "failed to load model because protobuf parsing failed" error.

CC @fdwr @honry

philloooo commented 1 month ago

hmm I am still getting the same error, after clearing browsing data.

ibelem commented 4 weeks ago

Thanks @philloooo , so this issue is not caused by Safety Checker, but the Text Encoder and more models of SD 1.5 and SD Turbo, we found many int64 for Gather indices and other issues below:

Unsupported data type int64 for input operand
Unsupported data type int64 for constant
Unsupported data type int64 for output

Models	WebNN-GPU error in M3
sd-unet-v1.5-demo-layernorm	TypeError: Failed to execute 'input' on 'MLGraphBuilder': Unsupported data type int64 for input operand named 'timestep', must be one of [float32, float16, int32].
sd-v1.5-text-encoder-demo	TypeError: Failed to execute 'constant' on 'MLGraphBuilder': Unsupported data type int64 for constant, must be one of [float32, float16, int32].
sd-turbo-text-encoder-fp16-demo	TypeError: Failed to execute 'argMax' on 'MLGraphBuilder': [/text_model/ArgMax] Unsupported data type int64 for output, must be one of [int32].
sd-turbo-unet-fp16-demo	[pageerror] 3486345904

There is no simple int32 conversion solution for existing ONNX int64 models. It's not easy to force developers to use int32 models, especially there are already plenty of SOTA int64 models in hugginface.co and other model zoos, this question needs to be dissussed in the WG.

@fdwr @huningxin

philloooo commented 4 weeks ago

Yeah I think we need an automatic way to handle such case in ONNX. Is https://github.com/microsoft/onnxruntime/issues/21401 sufficient to solve all the errors you mentioned above?

@honry @ibelem

Honry commented 3 weeks ago

Yeah I think we need an automatic way to handle such case in ONNX. Is microsoft/onnxruntime#21401 sufficient to solve all the errors you mentioned above?

@Honry @ibelem

Not enough, the opSupportLimits() should be also used in WebNN EP, to help filter those ops that have int64 input/output, then fallback them to CPU EP.

philloooo commented 3 weeks ago

I think it's very undesirable to just fall back to CPU EP, which just means these ops don't work on some devices for WebNN and could have a significant performance penalty to switching between EPs. Can't we handle it with using opSupportLimits and handle it with casting in ORT and pass int32 to CoreML instead? You mentioned that CoreML EP don't just fallback to CPU EP but do casting instead right?

Honry commented 3 weeks ago

Can't we handle it with using opSupportLimits and handle it with casting in ORT and pass int32 to CoreML instead? You mentioned that CoreML EP don't just fallback to CPU EP but do casting instead right?

That maybe a good solution, I will try once opSupportLimits is ready for all backends.

Honry commented 3 weeks ago

@philloooo, how does CoreML handle the cast from int64 to int32 when data overflow happens?

Two concerns:

If it throws, it will break the model.
If it doesn't throws and truncates the data instead, accuracy issue may happen.

philloooo commented 1 week ago

@Honry answered here https://github.com/microsoft/onnxruntime/issues/21401#issuecomment-2312135525

microsoft / webnn-developer-preview

Stable diffusion model fails on Mac #22