Low accuracy when use ipex + quantize

rnwang04 commented 1 year ago

Hi, I am trying to use ipex to quantize unet model following https://github.com/intel/intel-extension-for-pytorch/blob/v1.12.0/docs/tutorials/features/int8.md. Now the model can be quantized, but the generation results become very poor. I wonder is there any method (e.g. change mode or modify some config) to avoid such low accuracy after quantization with ipex? My torch version: 1.12.1 My ipex version: 1.12.100 Thanks !

jingxu10 commented 1 year ago

Please have a try with https://github.com/intel/neural-compressor. It can calibrate a model while trying to keep accuracy.

jgong5 commented 1 year ago

@rnwang04 Do you mind also let us know what model you were quantizing?

rnwang04 commented 1 year ago

@jgong5 Hi, thanks for response. I was trying to quantize unet model in stable diffusion pipeline. Below is my script for your reference if you want to reproduce (which may need to set return_dict=False manually). I also tried to replace input_sample with true input, but the result still cannot meet my needs.

import intel_extension_for_pytorch as ipex
from intel_extension_for_pytorch.quantization import prepare, convert
qconfig = ipex.quantization.default_static_qconfig

from diffusers import StableDiffusionPipeline
import torch

model_id = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionPipeline.from_pretrained(model_id, use_auth_token=True)
prompt = "a photo of an astronaut riding a horse on mars"
original_image = pipe(prompt, guidance_scale=7.5)
original_image[0][0].save("astronaut_rides_horse_original.png")

unet = pipe.unet

user_model = unet
user_model.eval()

sample_latents = torch.randn((1, unet.in_channels, 64, 64),
                   generator=None,
                   device="cpu",
                   dtype=torch.float32)

input_sample=(torch.cat([sample_latents]),
          torch.tensor([980], dtype=torch.long),
              torch.randn(
                (1, 77, 768),
                generator=None,
                device="cpu",
                dtype=torch.float32))

prepared_model = prepare(user_model, qconfig, example_inputs=input_sample, inplace=False)

for x in [input_sample]:
    print(len(x))
    prepared_model(*x)

convert_model = convert(prepared_model)
with torch.no_grad():
    traced_model = torch.jit.trace(convert_model, input_sample)
    traced_model = torch.jit.freeze(traced_model)
# for inference 
y = traced_model(*input_sample)
print(y[0].shape)
setattr(traced_model, "in_channels", 4)
setattr(traced_model, "device", torch.device('cpu'))

setattr(pipe, "unet", traced_model)

new_image = pipe(prompt, guidance_scale=7.5)
new_image[0][0].save("astronaut_rides_horse_ipex.png")

rnwang04 commented 1 year ago

Please have a try with https://github.com/intel/neural-compressor. It can calibrate a model while trying to keep accuracy.

@jingxu10 Thanks for your quick response ! Actually I have tried inc quantization with ipex, but it failed to work. I will report this issue to inc also.

jingxu10 commented 1 year ago

Got it. We will look into it.

Vasud-ha commented 1 year ago

I am unable to reproduce this issue from the given code snippet, even after setting return_dict=False, I am getting the following issue.

Vasud-ha commented 1 year ago

This issue is reproducible with an updated code snippet.

import intel_extension_for_pytorch as ipex
from intel_extension_for_pytorch.quantization import prepare, convert
qconfig = ipex.quantization.default_static_qconfig

from diffusers import StableDiffusionPipeline
import torch
import functools

model_id = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionPipeline.from_pretrained(model_id, use_auth_token=True)
unet = pipe.unet
unet.forward = functools.partial(unet.forward, return_dict=False)  # set return_dict=False as default
prompt = "a photo of an astronaut riding a horse on mars"
original_image = pipe(prompt, guidance_scale=7.5)
original_image[0][0].save("astronaut_rides_horse_original.png")

user_model = unet
user_model.eval()

sample_latents = torch.randn((1, unet.in_channels, 64, 64),
                   generator=None,
                   device="cpu",
                   dtype=torch.float32)

input_sample=(torch.cat([sample_latents]),
              torch.tensor([980], dtype=torch.long),
              torch.randn(
                    (1, 77, 768),
                    generator=None,
                    device="cpu",
                    dtype=torch.float32))

prepared_model = prepare(user_model, qconfig, example_inputs=input_sample, inplace=False)

for x in [input_sample]:
    print(len(x))
    prepared_model(*x)

convert_model = convert(prepared_model)
with torch.no_grad():
    traced_model = torch.jit.trace(convert_model, input_sample, strict=False)
    traced_model = torch.jit.freeze(traced_model)
# for inference 
y = traced_model(*input_sample)
setattr(traced_model, "in_channels", 4)
setattr(traced_model, "device", torch.device('cpu'))

setattr(pipe, "unet", traced_model)

new_image = pipe(prompt, guidance_scale=7.5, width=512, height=512)
new_image[0][0].save("astronaut_rides_horse_ipex.png")

You also need to add some changes to the stable diffusers pipeline source code as below :

# Change this line from pipeline_stable_diffusion.py
noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample 
# to
noise_pred = self.unet(latent_model_input, t, text_embeddings)[0]

jgong5 commented 1 year ago

cc @leslie-fang-intel @Xia-Weiwen and @XiaobingSuper

intel / intel-extension-for-pytorch

Low accuracy when use ipex + quantize #252