Open jojo1899 opened 1 month ago
I followed the steps here and it worked. Here is a summary of the steps:
Installing OpenVINO
pip install git+https://github.com/huggingface/optimum-intel.git
pip install git+https://github.com/openvinotoolkit/nncf.git
pip install openvino-nightly
Along with the above, I also needed to install openvino-tokenizers
for the quantization to be performed successfully.
pip install --pre -U openvino-tokenizers --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
The following is how the relevant packages in my virtual environment look:
openvino 2024.3.0.dev20240528
openvino-nightly 2024.3.0.dev20240528
openvino-telemetry 2024.1.0
openvino-tokenizers 2024.3.0.0.dev20240528
optimum 1.19.2
optimum-intel 1.17.0.dev0+aefabf0
Quantizing the model (INT4)
optimum-cli export openvino --model "microsoft/Phi-3-mini-4k-instruct" --task text-generation-with-past --weight-format int4 --group-size 128 --ratio 0.6 --sym --trust-remote-code ./openvinomodel/phi3/int4
I tried the above commands with Phi-3-mini-128k-instruct and then did inference using the above INT4 model with OVModelForCausalLM.from_pretrained()
. The responses are okay, but not very impressive.
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
The following command does not result in a quantized Phi-3 model in openvino format
optimum-cli export openvino --trust-remote-code -m microsoft/Phi-3-mini-128k-instruct --task text-generation-with-past --weight-format int4 ./openvino_phi3
It instead throws the follows error:
Expected behavior
I would expect the command to produce a quantized Phi-3 model as output.