Issues about using pytorch FrontEnd convert NNCF QAT INT8 model to IR

Reference the method in OpenVINO Stable Diffusion QAT example , the original sample code has already run success, I find that in this way, the unet a convert by INT8 pytorch -> INT8 ONNX -> INT8 IR, so that I try to using pytorch FrountEnd to directly convert INT8 pytorch to INT8 IR. but then convert_model the compressed export_unet model , the whole pipeline will hang. I guess this problem may be caused by pytorch FE's inability to parse the INT8 pytorch model. I hope to get your confirmation and want to know the cause of this problem. The following is the pytorch FE model convert part source code. ` def export_unet_pytorchFE(export_unet, save_dir): inputs = { "sample": torch.ones((2,4,64,64)), "timestep": torch.tensor(1), "encoder_hidden_states": torch.randn((2,77,768)), } input_names = ["sample", "timestep", "encoder_hidden_states"] UNET_CONTROL_OV_PATH = Path(f"{save_dir}/unet_int8.xml") unet = export_unet unet.eval().cpu()

from openvino.tools.mo import convert_model
from openvino.runtime import serialize
from openvino._offline_transformations import apply_moc_transformations, compress_quantize_weights_transformation, compress_model_transformation

ov_unet = convert_model(unet, example_input=inputs)
apply_moc_transformations(ov_unet, cf=False)
compress_quantize_weights_transformation(ov_unet)
# compress_model_transformation(ov_unet)
serialize(ov_unet,UNET_CONTROL_OV_PATH)`

The log before pipeline hang is in following:

My device environment info:

CPU : Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
GPU : NVIDIA TITAN RTX 24GB
OpenVINO 2023.2.0
torch 1.13.0
torchvision 0.14.0
nncf 2.6.0.dev0

Thanks for your attention.

Hi @18582088138, thanks a lot for reporting the issue and apologies for the delay. This example was deprecated and the new methodology recommended by the NNCF team is now hybrid quantization (which will be available in optimum-intel v1.16.0 but you can install optimum-intel from source in the meantime) where the model's weights are quantized, and only the activations of the U-Net component.

from optimum.intel import OVStableDiffusionPipeline, OVWeightQuantizationConfig

model = OVStableDiffusionPipeline.from_pretrained(
    model_id,
    export=True,
    quantization_config=OVWeightQuantizationConfig(bits=8, dataset="conceptual_captions"),
)

For additional information you can also take a look at this notebook or directly to our documentation

huggingface / optimum-intel

Issues about using pytorch FrontEnd convert NNCF QAT INT8 model to IR #467