SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
2.18k
stars
252
forks
source link
Add docstring for auto accelerator #1956
Closed
yiliu30 closed 2 months ago
/neural_compressor/common
->/neural-compressor/neural_compressor/common