[TorchFX] Torch FX/PyTorch 2 Export Quantization

alexsu52 commented 3 months ago

🚀 Feature request

Quantization is a widely used technique to accelerate models, particularly when using the torch.compile. For detailed tutorials and demonstrations on model quantization using PyTorch 2 Export Quantization, please refer to the following resources:

These guides show how to obtain a quantized model via the PyTorch 2 Export Quantization API and run it using torch.compile. However OpenVINO provide backend for torch.compile, but NNCF does not support quantization PyTorch 2 Export (torch.fx.GraphModule) models and users have to use X86InductorQuantizer to quantize models. Comparisons between PyTorch 2 Export INT8 models quantized by X86InductorQuantizer and OpenVINO INT8 models quantized by NNCF show that NNCF produces more accurate and efficient INT8 models.

Feature request is to support for torch.fx.GraphModule models in nncf.quantize to enable the creation of accurate and highly efficient models using torch.compile with the OpenVINO backend.

Feature Use Case

import torch
import nncf

# initialize a floating point model
float_model = M().eval()

# program capture
# NOTE: this API will be updated to torch.export API in the future, but the captured result should mostly stay the same
model = capture_pre_autograd_graph(float_model, *example_inputs)

# quantization
quantized_model = nncf.quantize(model, calibration_dataset)

# compile quantized model with OpenVINO backend
compiled_model = torch.compile(quantized_model, backend='openvino')

Are you going to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

alexsu52 commented 3 months ago

@daniil-lyakhov, please, analyze this feature request and open issues as sub-tasks of this feature request.

alexsu52 commented 3 months ago

I suugest to introduce the following API in NNCF, to support third-party quantizers and better alignment with PyTorch 2 Export Quantization API:

class OpenVINOQuantizer(Quantizer):
    # annotate nodes in the graph with observer or fake quant constructors
    # to convey the desired way of quantization
    def annotate(self, model: torch.fx.GraphModule) -> torch.fx.GraphModule:
        pass

    # validate the annotated graph is supported by the backend
    def validate(self, model: torch.fx.GraphModule) -> None:
        pass

    # annotate nodes in the graph with observer or fake quant constructors
    # to convey the desired way of quantization
    @classmethod
    def get_supported_operators(cls) -> List[OperatorConfig]:
        pass

# apply quantization pipeline for torch.export.ExportedProgram
def quantize_pt2e(
    model: torch.export.ExportedProgram, 
    calibration_dataset: Dataset, 
    quantizer: torch.ao.quantization.quantizer.Quantizer, 
    subset_size: int = 300,
    fast_bias_correction: Optional[bool] = True,
    smooth_quant: Optional[bool] = None,
    channel_alignment: Optional[bool] = None,
    bias_correction_params: Optional[AdvancedBiasCorrectionParameters] = None, 
    smooth_quant_alphas: Optional[AdvancedSmoothQuantParameters] = None,
)

openvinotoolkit / nncf