Open AndreyRGW opened 5 months ago
Not much to add other than that I've noticed the same while playing around with llama3. LMStudio generates at around 70-80 tok/s while the same model (Meta-Llama-3-8B-Instruct-Q8_0.gguf) in webui tops out at roughly half of that, around 45 tok/s.
RTX 4090, Win 10 22H2
I logged in merely to see if anyone else had seen this. Ooba runs around 9tok/s and LMStudio runs around 41tok/s. Tried with two different models too and both do better on LMStudio (Q4 quants of Llama and a random model, LemonadeRP)
Describe the bug
I checked IlyaGusev/saiga_llama3_8b_gguf, in LM Studio I get around 45-49 tokens, while in webui I get only 21 tokens.
LM Studio ^
Webui ^
Is there an existing issue for this?
Reproduction
Test same model in LM Studio and then in Webui
Screenshot
No response
Logs
System Info