Closed cphoward closed 11 months ago
Hi there, it seems that you are using a CPU with AMX instructions (probably SPR) with a Linux kernel without support of requesting AMX usage.
the code is to invoke a Linux system call to request access to Intel® AMX features. This is performed using the arch_prctl(2) based mechanism for applications to request usage of the Intel® AMX features. Specific information is described in the Linux kernel documentation.
https://www.intel.com/content/www/us/en/developer/articles/code-sample/advanced-matrix-extensions-intrinsics-functions.html (2nd section of Code sample walkthrough)
I also found some users reports that AMX is unavailable on VMs. Not sure if that is your case.
I can confirm this indeed was due to the kernel lacking support. Using a kernel >= 5.16 did the trick. I was able to get this working on a VM.
See https://lwn.net/Articles/874846/ for kernel details.
I am trying to run the examples in "Run LLM with Python Script". I can quantize, but I cannot run inference with
llama
due the following error:How do I overcome this error?
I am running this on Xeon Sapphire Rapids on Debian 5.10.197-1 (2023-09-29) x86_64 GNU/Linux.
Oddly, I can run inference fine for "Chat with LLaMA2", but quantization does not work.