The modern x86-64 CPU typically support the AVX-512 instruction extension.
In the compute bound task, AVX-512 can improve a lot.
Although inference part of LLM are mostly memory bound, but it can still speed up the prompt processing.
Please add the support for AVX-512 instruction extension.
The modern x86-64 CPU typically support the AVX-512 instruction extension. In the compute bound task, AVX-512 can improve a lot. Although inference part of LLM are mostly memory bound, but it can still speed up the prompt processing. Please add the support for AVX-512 instruction extension.