Closed Hyungyo1 closed 19 hours ago
if you use data type bfloat16 or int8, pytorch and ipex will use AMX.
Thanks for your response. If it's possible, could you please point out which C++ kernel code implements GEMM on AMX?
Please note that you will need the 4-th generation xeon or beyond to take advantage of AMX. The tpp kernel you refer to would invoke the micro-kernels in the libxsmm which would leverage AMX on the CPU platforms having the AMX HW support. See BrgemmTPP
at https://github.com/intel/intel-extension-for-pytorch/blob/46c870e83c277c0c29d8f0d3b26c17f62ffbfe1e/csrc/cpu/tpp/kernels/TPPGEMMKrnl.h#L137 and its implementation here: https://github.com/intel/intel-extension-for-pytorch/blob/46c870e83c277c0c29d8f0d3b26c17f62ffbfe1e/csrc/cpu/tpp/xsmm_functors.h#L1835 which calls into libxsmm.
@jgong5 IPEX calls the oneDNN kernels, doesn't it?
@jgong5 IPEX calls the oneDNN kernels, doesn't it?
Not always. We have multiple choices of kernels for GEMMs, some implemented with oneDNN and others implemented with TPP/intrinsics kernels.
@Hyungyo1 Could you feedback? If no more questions, we will close this issue.
@NeoZhangJianyu Yes, my question is answered. Thank you.
Describe the issue
Hi, I have a quick question regarding the LLM inference on CPUs using this extension. I've been digging into the LLM inference case, and it seems like the kernels written in C++ do not run on AMX (AVX512 is the only one I see). For example, _IPEXlinearReluCPU calls the torch.ops.torch_ipex.tpp_linear_relu C++ code which doesn't seem to be running on AMX. Is there any LLM layer that runs on AMX, and if so, which C++ code implements it?
Thank you.