SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
2.23k
stars
257
forks
source link
smoothquant, any quant/dequant module can be found in exported quant pt model ? #1996
Closed
tianylijun closed 2 months ago
intel/neural-compressor/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/smooth_quant/run_clm_no_trainer.py
问下,导出的量化模型里面,有没有QDQ节点?