Closed drspam1991 closed 1 week ago
What's the prompt? What's the CPU usage? What's the output of nvidia-smi -q -d POWER,TEMPERATURE
? What's the output of nvidia-smi -q | grep -A9 'Clocks Throttle Reasons'
?
Thank you for reply.
I use 'hey' load test tool to send parallel request to ollama with this prompt: "Consider the following Text and list of Topics: \n Text: 'مقایسه آماری ۲ دوره متوالی حراج هنر مدرن و معاصر/ چرا فروش کلی کاهش یافت؟\n' \n Topics: ( Painting|Sculpture|Photography|Drawing|Digital Art|Visual Arts|Theater|Dance|Music|Opera|Performing Arts|Poetry|Fiction|Non-Fiction|Play Script|Literary Arts|Art Movements|Art Styles|Censorship|Art Sales|Book|Events and Festivals|Graphic Design|Interior Design|Industrial Design|Fashion Design|Crafts|Architecture|Applied Arts|International Trade|Unemployment|Learning|Financial Crime|Book|Tax|Inflation|Import/Export|Currency|Banking|Accounting|Blockchain|Real Estate|Valuable Metals|Ministry of Economy|Ministry of Industry|Exchange|Brokerage|Marketing|Labor Market|Labor Migration|Wages and Benefits|Insurance|Investing|Saving|Retirement|Personal Finance|Stock Market|Forex Market|Crypto Currency|Gold|Financial Markets ). \n Choose most related topic(s) to the Text from the list. Your output must be topics name from the list with format [t1, ...] and say nothing else."
This is my CPU usage during test:
Output of nvidia-smi -q -d POWER,TEMPERATURE
:
==============NVSMI LOG==============
Timestamp : Sun Aug 4 21:13:32 2024 Driver Version : 545.29.06 CUDA Version : 12.3
Attached GPUs : 1 GPU 00000000:0B:00.0 Temperature GPU Current Temp : 54 C GPU T.Limit Temp : N/A GPU Shutdown Temp : 100 C GPU Slowdown Temp : 97 C GPU Max Operating Temp : 88 C GPU Target Temperature : 83 C Memory Current Temp : N/A Memory Max Operating Temp : N/A GPU Power Readings Power Draw : 2.93 W Current Power Limit : 225.00 W Requested Power Limit : 225.00 W Default Power Limit : 225.00 W Min Power Limit : 125.00 W Max Power Limit : 280.00 W Power Samples Duration : 52.45 sec Number of Samples : 119 Max : 42.78 W Min : 2.88 W Avg : 4.44 W GPU Memory Power Readings Power Draw : N/A Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A
and nvidia-smi -q | grep -A9 'Clocks Throttle Reasons'
has not any result
Output of nvidia-smi -q -d POWER,TEMPERATURE,PERFORMANCE
while the test is running? What is OLLAMA_NUM_PARALLEL
set to? Can you include some server logs?
@drspam1991 did you tried with another model than LLama3:8B? What are the results?
Output of
nvidia-smi -q -d POWER,TEMPERATURE,PERFORMANCE
while the test is running? What isOLLAMA_NUM_PARALLEL
set to? Can you include some server logs?
This is the result of nvidia-smi -q -d POWER,TEMPERATURE,PERFORMANCE
while test is running:
==============NVSMI LOG==============
Timestamp : Mon Aug 5 12:57:09 2024 Driver Version : 545.29.06 CUDA Version : 12.3
Attached GPUs : 1 GPU 00000000:0B:00.0 Performance State : P2 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Temperature GPU Current Temp : 77 C GPU T.Limit Temp : N/A GPU Shutdown Temp : 100 C GPU Slowdown Temp : 97 C GPU Max Operating Temp : 88 C GPU Target Temperature : 83 C Memory Current Temp : N/A Memory Max Operating Temp : N/A GPU Power Readings Power Draw : 157.68 W Current Power Limit : 225.00 W Requested Power Limit : 225.00 W Default Power Limit : 225.00 W Min Power Limit : 125.00 W Max Power Limit : 280.00 W Power Samples Duration : 2.37 sec Number of Samples : 119 Max : 246.08 W Min : 60.98 W Avg : 159.74 W GPU Memory Power Readings Power Draw : N/A Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A
The OLLAMA_NUM_PARALLEL is set to 5. When I increase this environment variable from 5, the throughput drops
And this is the ollama log: ollama.log
It looks there is some clock throttling happening: SW Power Cap : Active
. I'm not sure this fully explains the low GPU usage, but it's worth looking in to.
From the nvidia-smi manual page:
SW Power Cap SW Power Scaling algorithm is reducing the clocks below
requested clocks because the GPU is consuming too much
power. E.g. SW power cap limit can be changed with
nvidia-smi --power-limit=
But you may not be able to do anything about this by adjusting the power-limit:
-pl, --power-limit=POWER_LIMIT
Specifies maximum power limit in watts. Accepts integer and floating point numbers.
Only on supported devices from Kepler family. Requires administrator privileges.
Value needs to be between Min and Max Power Limit as reported by nvidia-smi.
From Wikipedia:
Kepler is the codename for a GPU microarchitecture developed by Nvidia, first introduced at retail in April 2012,[1] as the successor to the Fermi microarchitecture. Kepler was Nvidia's first microarchitecture to focus on energy efficiency. Most GeForce 600 series, most GeForce 700 series, and some GeForce 800M series GPUs were based on Kepler, all manufactured in 28 nm. Kepler found use in the GK20A, the GPU component of the Tegra K1 SoC, and in the Quadro Kxxx series, the Quadro NVS 510, and Tesla computing modules.
From GEFORCE_RTX_2080_User_Guide.pdf:
The GeForce® RTX 2080 is powered by the all-new NVIDIA Turing™ architecture to give you incredible new levels of gaming realism, speed, power efficiency, and immersion. This is graphics reinvented.
I also noticed in your first screencap of nvtop that your temperature was at 81°C, and your GPU Target Temperature
is 83°C, so you may also be experiencing SW Thermal Slowdown
.
You can monitor the GPU temperature, power draw and clock rate in nvtop by adjusting the settings in Setup > Chart > Displayed all GPUs.
@drspam1991 did you end up sorting this out?
What is the issue?
When using the Ollama tool with the LLaMA 3:8B model and all 33 offload layers loaded on the GPU, the GPU usage never goes over 70%. This seems suboptimal and may indicate an issue with how resources are being utilized.
OS
Linux
GPU
Nvidia
CPU
No response
Ollama version
0.2.8