technovangelist / obm

A tool to learn how your gpu compares to others when using ollama
13 stars 0 forks source link

win-obm runs only llama2:7b and does not detect RAM and VRAM #3

Open mann1x opened 6 months ago

mann1x commented 6 months ago

Is this a known issue?

This is the output:

Microsoft Windows 10 Pro 10.0.19045 with NaNGB and AMD Ryzen 9 5950X 16-Core Processor with 32 cores
GPU Info:
NVIDIA NVIDIA GeForce RTX 3090 with NaNGB vram

Using Ollama version: 0.1.31
Ensuring models are downloaded.
Loading orca-mini to reset
Loading llama2:7b
First run of llama2:7b took 1.83 seconds to load then 1.85 seconds to evaluate with 112.65 tokens per second
Second run of llama2:7b took NaN seconds to load then 2.30 seconds to evaluate with 112.63 tokens per second
Third run of llama2:7b took 0.00 seconds to load then 2.87 seconds to evaluate with 112.20 tokens per second
Fourth run of llama2:7b took 0.00 seconds to load then 2.90 seconds to evaluate with 112.70 tokens per second
Average Tokens per Second for llama2:7b is 112.55

Do you approve to send the output from this command to obm.tvl.st to share with everyone? No personal info is included [y/N] y
Your OBMScore is 844 and is made of 3 components:
llama2:7b OBMScore: 844
llama2:13b OBMScore: 0
llama2:70b OBMScore: 0
technovangelist commented 6 months ago

doh, hadnt tried since windows... in fact I was just thinking about this last night wondering if it did....

mann1x commented 6 months ago

sadly it doesn't :) wonder if I can do some tests but I don't know typescript at all...

from what I see you don't set the temperature at 0, is that right? would be better for benchmarking otherwise the scores can have big fluctuations

I'm also curious about the load_duration, I see you use it. Is it accessible in some way also via the http api? from the metrics seems missing