intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.75k stars 1.27k forks source link

Update Ollama with IPEX-LLM to a newer version #12411

Open NikosDi opened 1 week ago

NikosDi commented 1 week ago

Hello.

It seems that the latest Ollama with IPEX-LLM version (0.3.6) is a little old nowadays.

It doesn't have proper support for new and popular models like:

1) Phi 3.5 2) Qwen 2.5 3) Llama 3.2 4) Llama 3.2-vision

I tried the first two (Phi 3.5 and Qwen 2.5) with Intel's Ollama (IPEX-LLM) and it seems that they produce strange results. Especially Phi 3.5produces gibberish output.

Also, newer versions have fixed bugs, support new very useful commands and are faster (CPU optimizations) e.g

Bug fix: Fixed issue where setting OLLAMA_NUM_PARALLEL would cause models to be reloaded on lower VRAM systems

New (very useful) command: New ollama stop command to unload a running model.

thank you

qiuxin2012 commented 1 week ago

We are planning for a new rebase, related issue: https://github.com/intel-analytics/ipex-llm/issues/12370