SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
2.18k
stars
252
forks
source link
Add Docstring for TF 3x API and Torch 3x Mixed Precision #1944
Closed
zehao-intel closed 2 months ago
Type of Change
documentation
Description
How has this PR been tested?
PreCI
Dependency Change?
No