intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.73k stars 1.27k forks source link

ipex-llm-cpp-xpu container #12364

Open user7z opened 2 weeks ago

user7z commented 2 weeks ago

can you guys provide a container that has ollama only , the ipex-llm-cpp-inference-xpu has open-webui , but it has an old version since may , and its not working it works but cant chat , open-webui official container do work great , so if you provide a container that just has ollama dependencies , it would be great one to use in conjunction with the other one , i.e a dedicated container for ollama only

glorysdj commented 2 weeks ago

For now, you can use ipex-llm-cpp-inference-xpu to start only ollama and ignore open-webui to connect to the official open-webui container. We will also check and possibly upgrade open-webui in that container.

user7z commented 1 week ago

@glorysdj the thing is that ollama in that container is broken , it can run llama models , but fails for the others (smollm2 as exemple), and there is a horrific accuracy regression, (i see it with llama3.2) , but when used with openwebui in this container it failse , if you chat more than once , so i think , open-webui support from their side is nice , and just focus on ollama stabiliy in the container , its very large 25GB , i would be happy if there were an ollama only one that is stable as the locale one , but if you guys confident that you can make a stable open-webui container that can be configurable ,i.e easy way to pass openwebui env variables , so then i could automate all the stuff locally with quadlet podman , it worked great with open-webui official container , thank you guys

glorysdj commented 1 week ago

OK. Good point, we will decouple the openwebui and the ollama.