openvinotoolkit / nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference
Apache License 2.0
950 stars 237 forks source link

[TorchFX] Torch FX/PyTorch 2 Export Quantization #2766

Open alexsu52 opened 5 months ago

alexsu52 commented 5 months ago

🚀 Feature request

Quantization is a widely used technique to accelerate models, particularly when using the torch.compile. For detailed tutorials and demonstrations on model quantization using PyTorch 2 Export Quantization, please refer to the following resources:

These guides show how to obtain a quantized model via the PyTorch 2 Export Quantization API and run it using torch.compile. However OpenVINO provide backend for torch.compile, but NNCF does not support quantization PyTorch 2 Export (torch.fx.GraphModule) models and users have to use X86InductorQuantizer to quantize models. Comparisons between PyTorch 2 Export INT8 models quantized by X86InductorQuantizer and OpenVINO INT8 models quantized by NNCF show that NNCF produces more accurate and efficient INT8 models.

Feature request is to support for torch.fx.GraphModule models in nncf.quantize to enable the creation of accurate and highly efficient models using torch.compile with the OpenVINO backend.

Feature Use Case

import torch
import nncf

# initialize a floating point model​
float_model = M().eval()​

# program capture​
# NOTE: this API will be updated to torch.export API in the future,​ but the captured result should mostly stay the same​
model = capture_pre_autograd_graph(float_model, *example_inputs)

# quantization​
quantized_model = nncf.quantize(model, calibration_dataset)

# compile quantized model with OpenVINO bac​kend
compiled_model = torch.compile(quantized_model, backend='openvino')

Are you going to submit a PR?

alexsu52 commented 5 months ago

@daniil-lyakhov, please, analyze this feature request and open issues as sub-tasks of this feature request.

alexsu52 commented 5 months ago

I suugest to introduce the following API in NNCF, to support third-party quantizers and better alignment with PyTorch 2 Export Quantization API:

class OpenVINOQuantizer(Quantizer):​
    # annotate nodes in the graph with observer or fake quant constructors​
    # to convey the desired way of quantization​
    def annotate(self, model: torch.fx.GraphModule) -> torch.fx.GraphModule:​
        pass​

    # validate the annotated graph is supported by the backend​
    def validate(self, model: torch.fx.GraphModule) -> None:​
        pass​

    # annotate nodes in the graph with observer or fake quant constructors​
    # to convey the desired way of quantization​
    @classmethod​
    def get_supported_operators(cls) -> List[OperatorConfig]:​
        pass​

# apply quantization pipeline for torch.export.ExportedProgram​
def quantize_pt2e(​
    model: torch.export.ExportedProgram, ​
    calibration_dataset: Dataset, ​
    quantizer: torch.ao.quantization.quantizer.Quantizer, ​
    subset_size: int = 300,​
    fast_bias_correction: Optional[bool] = True,​
    smooth_quant: Optional[bool] = None,​
    channel_alignment: Optional[bool] = None,​
    bias_correction_params: Optional[AdvancedBiasCorrectionParameters] = None, ​
    smooth_quant_alphas: Optional[AdvancedSmoothQuantParameters] = None,​
)​
alexsu52 commented 1 month ago

@MaximProshin, I would like to provide a summary for this feature request:

Done:

The tasks are in progress for NNCF 2.14:

cc' @daniil-lyakhov @anzr299

daniil-lyakhov commented 1 month ago

I suugest to introduce the following API in NNCF, to support third-party quantizers and better alignment with PyTorch 2 Export Quantization API:

class OpenVINOQuantizer(Quantizer):​
    # annotate nodes in the graph with observer or fake quant constructors​
    # to convey the desired way of quantization​
    def annotate(self, model: torch.fx.GraphModule) -> torch.fx.GraphModule:​
        pass​

    # validate the annotated graph is supported by the backend​
    def validate(self, model: torch.fx.GraphModule) -> None:​
        pass​

    # annotate nodes in the graph with observer or fake quant constructors​
    # to convey the desired way of quantization​
    @classmethod​
    def get_supported_operators(cls) -> List[OperatorConfig]:​
        pass​

# apply quantization pipeline for torch.export.ExportedProgram​
def quantize_pt2e(​
    model: torch.export.ExportedProgram, ​
    calibration_dataset: Dataset, ​
    quantizer: torch.ao.quantization.quantizer.Quantizer, ​
    subset_size: int = 300,​
    fast_bias_correction: Optional[bool] = True,​
    smooth_quant: Optional[bool] = None,​
    channel_alignment: Optional[bool] = None,​
    bias_correction_params: Optional[AdvancedBiasCorrectionParameters] = None, ​
    smooth_quant_alphas: Optional[AdvancedSmoothQuantParameters] = None,​
)​

plus parameter range estimators