microsoft / Olive

Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
https://microsoft.github.io/Olive/
MIT License
1.51k stars 158 forks source link

The float16 unet model of stable-diffusion-2-1 outputs NAN results #1223

Open xhcao opened 1 month ago

xhcao commented 1 month ago

Describe the bug After running the command "python stable_diffusion.py --provider cuda --optimize --model_id stabilityai/stable-diffusion-2-1" in Olive/examples/stable_diffusion/ directory. float32 models will be generated in Olive/examples/stable_diffusion/models/unoptimized, and these models could run correctly. float16 models will be generated in Olive/examples/stable_diffusion/models/optimized-cuda, and these models could not run correctly, unet model outputs NAN results.

To Reproduce Example python code, import onnxruntime as ort from diffusers import OnnxStableDiffusionPipeline, DDIMScheduler

sess_options = ort.SessionOptions() sess_options.enable_mem_pattern = False

batch_size = 1 image_size = 768 provider = "cuda"

hidden_batch_size = batch_size * 2 sess_options.add_free_dimension_override_by_name("unet_sample_batch", hidden_batch_size) sess_options.add_free_dimension_override_by_name("unet_sample_channels", 4) sess_options.add_free_dimension_override_by_name("unet_sample_height", image_size // 8) sess_options.add_free_dimension_override_by_name("unet_sample_width", image_size // 8) sess_options.add_free_dimension_override_by_name("unet_time_batch", 1) sess_options.add_free_dimension_override_by_name("unet_hidden_batch", hidden_batch_size) sess_options.add_free_dimension_override_by_name("unet_hidden_sequence", 77)

model_id = "C:\workspace\models\stable-diffusion-2-1\optimized-cuda" provider_map = { "dml": "DmlExecutionProvider", "cuda": "CUDAExecutionProvider", }

pipeline = OnnxStableDiffusionPipeline.from_pretrained(model_dir, provider=provider_map[provider], sess_options=sess_options)

prompt = "giant castle, mountains, sunrise, volumetric lighting" pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config) image = pipeline( [prompt] * batch_size, num_inference_steps=20, callback= None, height=image_size, width=image_size, guidance_scale=7.5 ).images[0]

image.save("output.png")

Is there a way to convert pytorch stable-diffusion-2-1-float16 models to onnx-float16 models directly, not from pytorch stable-diffusion-2-1-float32 models. Thanks

jambayk commented 1 month ago

Hi, we haven't tested this example with stable-diffusion-2-1 model. The NAN outputs must be from the numerical instability in fp16 precision. Are you able to trace the source of the NAN to the unet model? Previously, we saw instability in the VAE for sdxl model but not the unet.

ONNX conversion from float16 model is not usually done since there are many onnx operators that do not support float16. So we usually convert in float precision and then convert it float16 as part of the transformers optimization step.

Do you have the model id for a float16 compatible stable-diffusion-2-1 model? like https://huggingface.co/madebyollin/sdxl-vae-fp16-fix? If so, it might be possible to run the example on that model. That's what we did for the sdxl example https://github.com/microsoft/Olive/blob/7e6f5129a0d486c7a00cd27c62f57cd33dfe7922/examples/directml/stable_diffusion_xl/stable_diffusion_xl.py#L483

xhcao commented 1 month ago

@jambayk , thanks for your reply. Currently, I have not the method how to trace which operator or node generates the NAN in the unet model. But I could try. I do not have the model id for a float16 compatible stable-diffusion-2-1 model, so I use Olive to generate float16 models from stabilityai/stable-diffusion-2-1. Do you have the plan to enable stabilityai/stable-diffusion-2-1 recently?

jambayk commented 1 month ago

hi, there are no plans for it currently. If there are large architectural changes to the model, more work might be needed from the onnxruntime team to support it. They also haven't explored sd-2-1 yet