Closed jiqing-feng closed 1 month ago
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Hi @echarlaix . Thanks for your review. The core optimization for the llama2 model is the IAKV, which will change the shape of kv-cache, so functions in assisted decoding like crop_past_key_values cannot be used. We can skip the assisted decoding tests for now and try to enable it in the future.
Hi @echarlaix. Reply to it. I have set do_sample=False
so you can compare the results between the ipex model and the original model.
Reply to it. It is not related to the transformer version; it's the current limitation for IAKV. I added the warning to clarify that only greedy search and beam search are verified for the patched model. Please have a look, thx!
Hi @echarlaix . I have finished all your required changes. Could you please take a look at the failed ipex CI? It is a weird import error.
Hi @echarlaix . I have finished all your required changes. Could you please take a look at the failed ipex CI? It is a weird import error.
Fixed!
Hi @echarlaix . I make some changes to the llama model since the ipex 2.3 is released. The API name has changed, and the assisted decoding cannot support for now.