nikita-savelyevv commented 1 month ago

What does this PR do?

In order to apply data-driven weights compression, an instance of OVModelForCausalLM is required. It however is not available during quantization applied at model export (here).

That's why in this PR some logic is added so that such case is processed separately after model is exported. This results in some save/load overhead, but compared to runtime of data-drive weights compression it should be negligible. Worth to note, data-free compression is still applied during export resulting in no additional overhead.

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Did you make sure to update the documentation with your changes?
[x] Did you write any new necessary tests?

HuggingFaceDocBuilderDev commented 1 month ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

nikita-savelyevv commented 1 month ago

@AlexKoff88 please take a look

huggingface / optimum-intel

[OV] Move data-driven quantization after model export for text-generation models #721

What does this PR do?

Before submitting