huggingface / optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools
https://huggingface.co/docs/optimum/main/en/intel/index
Apache License 2.0
415 stars 112 forks source link

enable qkv concat layer #958

Closed jiqing-feng closed 1 month ago

jiqing-feng commented 1 month ago

Enable QKV concat linear in llama which brings 10% speed-up in CPU