Open alexsu52 opened 5 months ago
@daniil-lyakhov, please, analyze this feature request and open issues as sub-tasks of this feature request.
I suugest to introduce the following API in NNCF, to support third-party quantizers and better alignment with PyTorch 2 Export Quantization API:
class OpenVINOQuantizer(Quantizer):​
# annotate nodes in the graph with observer or fake quant constructors​
# to convey the desired way of quantization​
def annotate(self, model: torch.fx.GraphModule) -> torch.fx.GraphModule:​
pass​
# validate the annotated graph is supported by the backend​
def validate(self, model: torch.fx.GraphModule) -> None:​
pass​
# annotate nodes in the graph with observer or fake quant constructors​
# to convey the desired way of quantization​
@classmethod​
def get_supported_operators(cls) -> List[OperatorConfig]:​
pass​
# apply quantization pipeline for torch.export.ExportedProgram​
def quantize_pt2e(​
model: torch.export.ExportedProgram, ​
calibration_dataset: Dataset, ​
quantizer: torch.ao.quantization.quantizer.Quantizer, ​
subset_size: int = 300,​
fast_bias_correction: Optional[bool] = True,​
smooth_quant: Optional[bool] = None,​
channel_alignment: Optional[bool] = None,​
bias_correction_params: Optional[AdvancedBiasCorrectionParameters] = None, ​
smooth_quant_alphas: Optional[AdvancedSmoothQuantParameters] = None,​
)​
@MaximProshin, I would like to provide a summary for this feature request:
Done:
torch.fx.GraphModule
models in nncf.quantize()
torch.fx.GraphModule
models in nncf.compress_weights()
The tasks are in progress for NNCF 2.14:
cc' @daniil-lyakhov @anzr299
I suugest to introduce the following API in NNCF, to support third-party quantizers and better alignment with PyTorch 2 Export Quantization API:
class OpenVINOQuantizer(Quantizer):​ # annotate nodes in the graph with observer or fake quant constructors​ # to convey the desired way of quantization​ def annotate(self, model: torch.fx.GraphModule) -> torch.fx.GraphModule:​ pass​ # validate the annotated graph is supported by the backend​ def validate(self, model: torch.fx.GraphModule) -> None:​ pass​ # annotate nodes in the graph with observer or fake quant constructors​ # to convey the desired way of quantization​ @classmethod​ def get_supported_operators(cls) -> List[OperatorConfig]:​ pass​ # apply quantization pipeline for torch.export.ExportedProgram​ def quantize_pt2e(​ model: torch.export.ExportedProgram, ​ calibration_dataset: Dataset, ​ quantizer: torch.ao.quantization.quantizer.Quantizer, ​ subset_size: int = 300,​ fast_bias_correction: Optional[bool] = True,​ smooth_quant: Optional[bool] = None,​ channel_alignment: Optional[bool] = None,​ bias_correction_params: Optional[AdvancedBiasCorrectionParameters] = None, ​ smooth_quant_alphas: Optional[AdvancedSmoothQuantParameters] = None,​ )​
plus parameter range estimators
🚀 Feature request
Quantization is a widely used technique to accelerate models, particularly when using the torch.compile. For detailed tutorials and demonstrations on model quantization using PyTorch 2 Export Quantization, please refer to the following resources:
These guides show how to obtain a quantized model via the PyTorch 2 Export Quantization API and run it using
torch.compile
. However OpenVINO provide backend fortorch.compile
, but NNCF does not support quantization PyTorch 2 Export (torch.fx.GraphModule
) models and users have to useX86InductorQuantizer
to quantize models. Comparisons between PyTorch 2 Export INT8 models quantized byX86InductorQuantizer
and OpenVINO INT8 models quantized byNNCF
show thatNNCF
produces more accurate and efficient INT8 models.Feature request is to support for
torch.fx.GraphModule
models innncf.quantize
to enable the creation of accurate and highly efficient models usingtorch.compile
with the OpenVINO backend.Feature Use Case
Are you going to submit a PR?