Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.7k
stars
1.26k
forks
source link
performance problem about internvl image embedding using ggml.dll #12376
Hi @cjsdurj , thanks for pointing out this issue.
I have fixed it, you could try it again with ggml.dll released in pip install ipex-llm>=2.2.0b20241111 tomorrow.
problem desc
Image embedding using ggml.dll provided by ipex will become slower and slower, while using llama.cpp a1631e5 build performance is stable.
test code
clip source code can be found in https://github.com/ggerganov/llama.cpp/pull/9403
env
ultra 7 155H igpu , windows11