Closed lhl closed 3 weeks ago
Hi @lhl , could you please let me know what's your current ipex-llm[cpp] version ?
I have raised a PR to further fix this issue, you may try it again with pip install ipex-llm[cpp]>=2.2.0b20241104
tomorrow. 😊
I installed ipex-llm a couple days ago exactly following these docs: https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md
BTW, this doesn't work as expected:
python -c 'import ipex_llm; print(ipex_llm.__version__);'
Traceback (most recent call last):
File "<string>", line 1, in <module>
AttributeError: module 'ipex_llm' has no attribute '__version__'
But running an upgrade, it looks like I am using 2.2.0b20241031
:
❯ pip install -U ipex-llm[cpp]
Requirement already satisfied: ipex-llm[cpp] in /home/lhl/miniforge3/envs/llm-cpp/lib/python3.11/site-packages (2.2.0b20241031)
I'll keep an eye out for an update!
Hi @lhl , please upgrade your ipex-llm[cpp]
to latest version (2.2.0b20241105) by pip install --pre --upgrade ipex-llm[cpp]
first. 😊
OK, great, confirmed that the updated version works now, thanks! 🥳
I've updated the writeup I recently did and added Q4_K_M performance numbers, btw if you're interested: https://www.reddit.com/r/LocalLLaMA/comments/1gheslj/testing_llamacpp_with_intels_xe2_igpu_core_ultra/
I found the IPEX-LLM backend is significantly more performant than the upstream SYCL backend, and I bet a lot of people (tens of millions of MTL/LNL laptop owners) could potentially benefit if this could be upstreamed (especially since wrappers like Ollama, LM Studio, etc track upstream).
I can confirm that I see a warning get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
, when running llama-bench
on Q4_K_M models.
I am using the latest ipex-llm 2.2.0b20241123
, on windows. Performance does not seem to be affected. Setting ZES_ENABLE_SYSMAN=1 has no effect.
I have a Lunar Lake Intel Core Ultra 7 258V with an Intel Arc 140V Xe2 iGPU. I have both Intel oneAPI Base Toolkit 2025.0.0 and 2024.2.1 installed, and am using the latter with ipex-llm.
Loading:
Confirmation:
I can run Q4_0 quants:
However, it looks like k-quants, like Q4_K_M are broken:
Note, the SYCL backend of upstream llama.cpp is much slower, but works with both 2025.0.0 and 2024.2.1 on Q4_K_Ms: