Open liuxt670 opened 1 month ago
optimum-cli export openvino --trust-remote-code --model internlm/internlm2-chat-1_8b --weight-format int4 internlm2-chat-1_8b
fails for optimum==1.21.2
and optimum-intel==1.18.1
with
File "C:\Users\vzlobin\AppData\Roaming\Python\Python312\site-packages\optimum\exporters\openvino\model_patcher.py", line 1117, in _internlm2_attention_forward
kv_seq_len += past_key_value[0].shape[-2]
^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'tuple' object has no attribute 'shape'
@Wovchena Thanks for the reply.
optimum==1.21.2
and optimum-intel==1.19.0.dev0+9ef6766
, and this is the command we use to convert the model:optimum-cli export openvino --task text-generation-with-past --model internlm2-chat-1_8b/ --weight-format int4 --trust-remote-code internlm2_openvino
pip install git+https://github.com/huggingface/optimum-intel.git@9ef6766
solved the conversion problem.
Since CPU is fine, it's not do_sample=false
to blame. It's a dGPU problem. I transferred the issue to the main repo.
Hi, we encountered some issues while running your sample on Intel GPU. This is the model we are using: https://huggingface.co/internlm/internlm2-chat-1_8b/tree/main We convert this model to int4 OpenVino format and run on Intel GPU.
Here is the code we running, just made some modifications to your sample code:
In the code above, we set config.do_sample=false and using pipe.finish_chat() to clear kv_cache after each round of generation. Ideally, we should get exactly the same result in each round of generation, however, we found that during each run, the result of the first round is always random, while the others are the same. It seems that the do_sample=false setting did not work for the first round of generation. Here are some results: Do you have any suggestions on this issue? Really thanks!