intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

https://intel.github.io/neural-compressor/

Apache License 2.0

2.23k stars 257 forks source link

[For Review Only] Release Notes for v3.0 #1915

Closed thuang6 closed 3 months ago

thuang6 commented 4 months ago

v3.0 Release notes draft, render version

[x] FP8 quantization support after Habana HQT Integration commits upstream back to INC
[x] INT4 model loading support after Habana commits upstream back to INC
[x] Remain PRs for client side quantization improvements
[x] Remain PRs for examples
[x] Remain PRs for documents
[ ] Multimodal model support in Highlights and examples if it is fitted into v3.0 (dropped)

github-actions[bot] commented 4 months ago

⚡ Required checks status: All passing 🟢

No groups match the files changed in this PR.

Thank you for your contribution! 💜

Note This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact chensuyue or XuehaoSun for help.