huggingface / optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools
https://huggingface.co/docs/optimum/main/en/intel/index
Apache License 2.0
358 stars 99 forks source link

ipex 2.3 released #725

Closed jiqing-feng closed 1 month ago

jiqing-feng commented 1 month ago

Hi @echarlaix . I make some changes to the llama model since the ipex 2.3 is released. The API name has changed, and the assisted decoding cannot support for now.

HuggingFaceDocBuilderDev commented 1 month ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

jiqing-feng commented 1 month ago

Hi @echarlaix . Thanks for your review. The core optimization for the llama2 model is the IAKV, which will change the shape of kv-cache, so functions in assisted decoding like crop_past_key_values cannot be used. We can skip the assisted decoding tests for now and try to enable it in the future.

jiqing-feng commented 1 month ago

Hi @echarlaix. Reply to it. I have set do_sample=False so you can compare the results between the ipex model and the original model.

jiqing-feng commented 1 month ago

Reply to it. It is not related to the transformer version; it's the current limitation for IAKV. I added the warning to clarify that only greedy search and beam search are verified for the patched model. Please have a look, thx!

jiqing-feng commented 1 month ago
  1. Check the generation config and raise an error if it does not support it.
  2. Transformers version.
  3. Test re-load for patched model (llama).
  4. Keep llama in tests.
jiqing-feng commented 1 month ago

Hi @echarlaix . I have finished all your required changes. Could you please take a look at the failed ipex CI? It is a weird import error.

jiqing-feng commented 1 month ago

Hi @echarlaix . I have finished all your required changes. Could you please take a look at the failed ipex CI? It is a weird import error.

Fixed!