In order to apply data-driven weights compression, an instance of OVModelForCausalLM is required. It however is not available during quantization applied at model export (here).
That's why in this PR some logic is added so that such case is processed separately after model is exported. This results in some save/load overhead, but compared to runtime of data-drive weights compression it should be negligible. Worth to note, data-free compression is still applied during export resulting in no additional overhead.
Before submitting
[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Did you make sure to update the documentation with your changes?
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
What does this PR do?
In order to apply data-driven weights compression, an instance of
OVModelForCausalLM
is required. It however is not available during quantization applied at model export (here).That's why in this PR some logic is added so that such case is processed separately after model is exported. This results in some save/load overhead, but compared to runtime of data-drive weights compression it should be negligible. Worth to note, data-free compression is still applied during export resulting in no additional overhead.
Before submitting