Add recent publications

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

https://intel.github.io/neural-compressor/

Apache License 2.0

2.23k stars 257 forks source link

Closed thuang6 closed 2 months ago

thuang6 commented 2 months ago

documentation