Open pepijndevos opened 2 weeks ago
Hi @pepijndevos , we have reproduced your issue and are working on finding a solution. We will inform you ASAP.
I ran into similar but less obvious problems where qwen2.5-coder:14b
will just get stuck int repeating patterns or suddenly start talking about something completely different, while running on CPU reliably produces sensible results.
| Q | Output| Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data
I ran into similar but less obvious problems where
qwen2.5-coder:14b
will just get stuck int repeating patterns or suddenly start talking about something completely different, while running on CPU reliably produces sensible results.| Q | Output| Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data
I was able to reproduce the issue. I have a burning suspicion that this has to do with the way memory is being shared. I am running Arc A750 with iGPU disabled. Since the card only have 8GB of GDDR6, I can realistically only load one 8b parameter model reliably. When loading multiple models (where total memory >8GB) I see similar behavior.
My speculation is that something is going wrong when accessing models that share GPU and system memory.
I ran into similar but less obvious problems where
qwen2.5-coder:14b
will just get stuck int repeating patterns or suddenly start talking about something completely different, while running on CPU reliably produces sensible results.| Q | Output| Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data
Maybe we have fixed this 2 weeks ago, could you update your ipex-llm and try again?
Here is a trace from my Intel Arc A770 via Docker:
And here is an trace from Arch linux running on CPU:
For Docker I'm using https://github.com/mattcurf/ollama-intel-gpu due to #12372
ollama logs: