Closed Alwahsh closed 3 months ago
@Alwahsh I see you use an old version of NNCF dating back February 21. The issue you're facing was fixed in the later version of NNCF. I would suggest you to update to the latest release.
As a side note. In the code example you're trying to apply PTQ to an LLM model. In general this significantly worsens the generation quality. Quantizing LLM activations introduces significant quantization errors due to activation values ranges being drastically different across channels. That's why at the moment we only apply weight compression to LLM models.
Upgrading resolved the problem. Thanks for the suggestion and the side note. You're right but I'm trying to make use of a feature that requires activations to be in Int8
🐛 Describe the bug
Hello,
I'm trying to quantize llama models using
OVQuantizer
but I'm facing an error:I tried llama3 and llama2
Environment
Minimal Reproducible Example
Are you going to submit a PR?