huggingface / optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools
https://huggingface.co/docs/optimum/main/en/intel/index
Apache License 2.0
358 stars 99 forks source link

refactor CPU llama inference code #728

Closed faaany closed 1 month ago

faaany commented 1 month ago

What does this PR do?

This PR refactors the current CPU llama inference code to make code clean. The major changes are as follows:

Please note that this PR is based on the unmerged PR #725 by Jiqing as can be seen in the commit history.

HuggingFaceDocBuilderDev commented 1 month ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

echarlaix commented 1 month ago

https://github.com/huggingface/optimum-intel/pull/725 is now merged, you mind rebasing @faaany ?

faaany commented 1 month ago

725 is now merged, you mind rebasing @faaany ?

Cool! Rebase done, pls have a review. Thx!

faaany commented 1 month ago

Hi @echarlaix , I manually-checked the changes in #725 and fixed the bugs introduced through rebasing. Now all tests passed, I think we are good to go.