intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.56k stars 1.25k forks source link

Cannot find dGPU when install ollama on Windows #11340

Open YunLiu1 opened 3 months ago

YunLiu1 commented 3 months ago

When "pip install ipex-llm[cpp]", then "init-ollama.bat", it runs on CPU: " ... msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="31.6 GiB" ... "

But when "pip install ipex-llm[xpu]", it can run on my A770 dGPU.

When install them both "pip install ipex-llm[cpp,xpu]", i got this error:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. bigdl-core-cpp 2.5.0b20240616 requires torch==2.2.0, but you have torch 2.1.0a0+cxx11.abi which is incompatible.

sgwhat commented 3 months ago

Hi @YunLiu1,

  1. msg="inference compute" id=0 library=cpu is a confusing and useless runtime log, and it does not mean that ollama is running on CPU. To ensure that it's running on the dGPU, you may follow the steps below:
    • Check the output from the ollama server. When running successfully on the dGPU, ollama will produce output similar to the sample output.
    • Check the memory usage of your dGPU during running model inference.
  2. To run ipex-llm ollama on your dGPU, you only need to install ipex-llm[cpp]. For more details, please see our ollama document.