Closed konian71 closed 2 months ago
Would you happen to have logs from when the RX6800XT was plugged in?
Would you happen to have logs from when the RX6800XT was plugged in?
I'm sorry, but the old logs were deleted because I tried to fix the problem with a fresh install of Ollama.
There are a lot of variables in your setup, so it's hard to say for certain.
From the log you shared, I see your Nvidia GPU has 14.7G available, and you only were able to load 10/47 layers to GPU. This indicates the vast majority of your inference is taking place on your CPU, not your GPU. If we had logs from the Radeon setup we could compare to see if the number of layers was significantly different, or if there's something else causing a disparity in performance.
Using a smaller model that mostly or completely fits in the GPU should yield significantly better performance. If you really need to use the larger model, you could try forcing it to run exclusively on the CPU and then use the cpu_avx2
runner to see if that performs better.
If you're still seeing performance disparity, please make sure you're running the latest version and share a server log of the AMD GPU and NVIDIA GPU loading the same model so we can try to understand what's going on better.
What is the issue?
I have a setup with the following specifications:
CPU: AMD Ryzen 5700X RAM: 128GB DDR4-3200, CL16 Old GPU: AMD RX6800XT New GPU: Nvidia RTX4070Ti Super
I am running large language models, specifically Gemma2:32b-fp16 and LLaMA3:70b. All drivers are up to date, and the system was cleaned with DDU before installing the new GPU.
I am very confused because the RTX4070Ti Super takes 21 minutes to complete tasks, whereas the RX6800XT only takes 6 minutes for the same task. The VRAM of the RTX4070Ti Super fills up to 16GB, but the GPU load never exceeds 60% and is mostly near zero. The CPU load with the Nvidia GPU never goes above 80%. In contrast, with the AMD GPU, both the CPU and GPU load approach 100%.
Can you help me understand why this is happening? What is going wrong?
OS
Windows
GPU
Nvidia
CPU
AMD
Ollama version
0.2.7