huggingface / optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

https://huggingface.co/docs/optimum/main/en/intel/index

Apache License 2.0

364 stars 101 forks source link

Update NNCF to 2.10. Enable AWQ algorithm. #673

Closed nikita-savelyevv closed 3 months ago

nikita-savelyevv commented 3 months ago

What does this PR do?

Update NNCF to 2.10
Enable AWQ weight compression algorithm

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Did you make sure to update the documentation with your changes?
[ ] Did you write any new necessary tests?

HuggingFaceDocBuilderDev commented 3 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

AlexKoff88 commented 3 months ago

CI is down due to the required new version of NNCF. I am ok with the changes. @echarlaix, we will notify you once the release is out.

echarlaix commented 3 months ago

Also would it make sense to have specific configs for different methodologies such as OVAwqConfig like done in transformers what do you think @AlexKoff88 @nikita-savelyevv ?

AlexKoff88 commented 3 months ago

@echarlaix, AWQ is just a part of our weight quantization pipeline, not a separate method. I don't think we need a separate config for it. It is better to have it as an option that can be turned on.