intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
https://intel.github.io/neural-compressor/
Apache License 2.0
2.23k stars 257 forks source link

smoothquant, any quant/dequant module can be found in exported quant pt model ? #1996

Closed tianylijun closed 2 months ago

tianylijun commented 2 months ago

intel/neural-compressor/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/smooth_quant/run_clm_no_trainer.py

问下,导出的量化模型里面,有没有QDQ节点?